Predict Bike Sharing Demand with AutoGluon Template¶
Project: Predict Bike Sharing Demand with AutoGluon¶
This notebook is a template with each step that you need to complete for the project.
Please fill in your code where there are explicit ? markers in the notebook. You are welcome to add more cells and code as you see fit.
Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.
File-> Export Notebook As... -> Export Notebook as HTML
There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.
Completing the code template and writeup template will cover all of the rubric points for this project.
The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.
Step 1: Create an account with Kaggle¶
Create Kaggle Account and download API key¶
Below is example of steps to get the API username and key. Each student will have their own username and key.
- Open account settings.
- Scroll down to API and click Create New API Token.
- Open up
kaggle.jsonand use the username and key.
Step 2: Download the Kaggle dataset using the kaggle python library¶
Open up Sagemaker Studio and use starter template¶
- Notebook should be using a
ml.t3.mediuminstance (2 vCPU + 4 GiB) - Notebook should be using kernal:
Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)
Install packages¶
!pip install -U pip
!pip install -U setuptools wheel
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
!pip install autogluon --no-cache-dir
# Without --no-cache-dir, smaller aws instances may have trouble installing
Requirement already satisfied: pip in /opt/conda/lib/python3.10/site-packages (23.3.2) Collecting pip Downloading pip-24.0-py3-none-any.whl.metadata (3.6 kB) Downloading pip-24.0-py3-none-any.whl (2.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 43.2 MB/s eta 0:00:0000:01 Installing collected packages: pip Attempting uninstall: pip Found existing installation: pip 23.3.2 Uninstalling pip-23.3.2: Successfully uninstalled pip-23.3.2 Successfully installed pip-24.0 Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (69.5.1) Requirement already satisfied: wheel in /opt/conda/lib/python3.10/site-packages (0.43.0) Collecting mxnet<2.0.0 Downloading mxnet-1.9.1-py3-none-manylinux2014_x86_64.whl.metadata (3.4 kB) Collecting bokeh==2.0.1 Downloading bokeh-2.0.1.tar.gz (8.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.6/8.6 MB 85.8 MB/s eta 0:00:00:00:010:01 Preparing metadata (setup.py) ... done Requirement already satisfied: PyYAML>=3.10 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (6.0.1) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (2.9.0) Requirement already satisfied: Jinja2>=2.7 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (3.1.3) Requirement already satisfied: numpy>=1.11.3 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (1.26.4) Requirement already satisfied: pillow>=4.0 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (9.5.0) Requirement already satisfied: packaging>=16.8 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (23.2) Requirement already satisfied: tornado>=5 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (6.4) Requirement already satisfied: typing_extensions>=3.7.4 in /opt/conda/lib/python3.10/site-packages (from bokeh==2.0.1) (4.5.0) Requirement already satisfied: requests<3,>=2.20.0 in /opt/conda/lib/python3.10/site-packages (from mxnet<2.0.0) (2.31.0) Collecting graphviz<0.9.0,>=0.8.1 (from mxnet<2.0.0) Downloading graphviz-0.8.4-py2.py3-none-any.whl.metadata (6.4 kB) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from Jinja2>=2.7->bokeh==2.0.1) (2.1.5) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.1->bokeh==2.0.1) (1.16.0) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (1.26.18) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2024.2.2) Downloading mxnet-1.9.1-py3-none-manylinux2014_x86_64.whl (49.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.1/49.1 MB 30.7 MB/s eta 0:00:00:00:0100:01 Downloading graphviz-0.8.4-py2.py3-none-any.whl (16 kB) Building wheels for collected packages: bokeh Building wheel for bokeh (setup.py) ... done Created wheel for bokeh: filename=bokeh-2.0.1-py3-none-any.whl size=9080016 sha256=ce5d859866c5f8ac32dfa7377071e6ce9e0227590e15e0b6db6a74df439f9969 Stored in directory: /home/sagemaker-user/.cache/pip/wheels/be/b4/d8/7ce778fd6e637bea03a561223a77ba6649aff8168e3c613754 Successfully built bokeh Installing collected packages: graphviz, mxnet, bokeh Attempting uninstall: graphviz Found existing installation: graphviz 0.20.3 Uninstalling graphviz-0.20.3: Successfully uninstalled graphviz-0.20.3 Successfully installed bokeh-2.0.1 graphviz-0.8.4 mxnet-1.9.1 Requirement already satisfied: autogluon in /opt/conda/lib/python3.10/site-packages (0.8.2) Requirement already satisfied: autogluon.core==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.core[all]==0.8.2->autogluon) (0.8.2) Requirement already satisfied: autogluon.features==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon) (0.8.2) Requirement already satisfied: autogluon.tabular==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (0.8.2) Requirement already satisfied: autogluon.multimodal==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon) (0.8.2) Requirement already satisfied: autogluon.timeseries==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries[all]==0.8.2->autogluon) (0.8.2) Requirement already satisfied: numpy<1.27,>=1.21 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.26.4) Requirement already satisfied: scipy<1.12,>=1.5.4 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.11.4) Requirement already satisfied: scikit-learn<1.5,>=1.3.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.4.2) Requirement already satisfied: networkx<4,>=3.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.3) Requirement already satisfied: pandas<2.2.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2.1.4) Requirement already satisfied: tqdm<5,>=4.38 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (4.66.2) Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2.31.0) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.8.4) Requirement already satisfied: boto3<2,>=1.10 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.34.51) Requirement already satisfied: autogluon.common==0.8.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (0.8.2) Collecting hyperopt<0.2.8,>=0.2.7 (from autogluon.core[all]==0.8.2->autogluon) Downloading hyperopt-0.2.7-py2.py3-none-any.whl.metadata (1.7 kB) Requirement already satisfied: pydantic<2.0,>=1.10.4 in /opt/conda/lib/python3.10/site-packages (from autogluon.core[all]==0.8.2->autogluon) (1.10.14) Collecting ray<2.7,>=2.6.3 (from ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading ray-2.6.3-cp310-cp310-manylinux2014_x86_64.whl.metadata (12 kB) Requirement already satisfied: Pillow<9.6,>=9.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (9.5.0) Requirement already satisfied: torch<2.1,>=1.13 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.0.0.post101) Requirement already satisfied: pytorch-lightning<2.1,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.0.9) Requirement already satisfied: jsonschema<4.18,>=4.14 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (4.17.3) Requirement already satisfied: seqeval<1.3.0,>=1.2.2 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.2.2) Requirement already satisfied: evaluate<0.5.0,>=0.4.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.4.1) Requirement already satisfied: accelerate<0.22.0,>=0.21.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.21.0) Requirement already satisfied: transformers<4.32.0,>=4.31.0 in /opt/conda/lib/python3.10/site-packages (from transformers[sentencepiece]<4.32.0,>=4.31.0->autogluon.multimodal==0.8.2->autogluon) (4.31.0) Requirement already satisfied: timm<0.10.0,>=0.9.5 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.9.16) Requirement already satisfied: torchvision<0.16.0,>=0.14.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.15.2a0+072ec57) Requirement already satisfied: scikit-image<0.20.0,>=0.19.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.19.3) Requirement already satisfied: text-unidecode<1.4,>=1.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.3) Requirement already satisfied: torchmetrics<1.1.0,>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.0.3) Requirement already satisfied: nptyping<2.5.0,>=1.4.4 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.4.1) Requirement already satisfied: omegaconf<2.3.0,>=2.1.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.2.3) Requirement already satisfied: pytorch-metric-learning<2.0,>=1.3.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.7.3) Requirement already satisfied: nlpaug<1.2.0,>=1.1.10 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (1.1.11) Requirement already satisfied: nltk<4.0.0,>=3.4.5 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (3.8.1) Requirement already satisfied: openmim<0.4.0,>=0.3.7 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.3.7) Requirement already satisfied: defusedxml<0.7.2,>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.7.1) Requirement already satisfied: jinja2<3.2,>=3.0.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (3.1.3) Requirement already satisfied: tensorboard<3,>=2.9 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (2.12.3) Requirement already satisfied: pytesseract<0.3.11,>=0.3.9 in /opt/conda/lib/python3.10/site-packages (from autogluon.multimodal==0.8.2->autogluon) (0.3.10) Requirement already satisfied: catboost<1.3,>=1.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (1.2.3) Requirement already satisfied: xgboost<1.8,>=1.6 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (1.7.6) Requirement already satisfied: fastai<2.8,>=2.3.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (2.7.14) Requirement already satisfied: lightgbm<3.4,>=3.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.tabular[all]==0.8.2->autogluon) (3.3.5) Requirement already satisfied: joblib<2,>=1.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (1.4.0) Requirement already satisfied: statsmodels<0.15,>=0.13.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.14.1) Requirement already satisfied: gluonts<0.14,>=0.13.1 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.13.7) Requirement already satisfied: statsforecast<1.5,>=1.4.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (1.4.0) Requirement already satisfied: mlforecast<0.7.4,>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.7.3) Requirement already satisfied: ujson<6,>=5 in /opt/conda/lib/python3.10/site-packages (from autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (5.9.0) Requirement already satisfied: psutil<6,>=5.7.3 in /opt/conda/lib/python3.10/site-packages (from autogluon.common==0.8.2->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (5.9.8) Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (from autogluon.common==0.8.2->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (69.5.1) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from accelerate<0.22.0,>=0.21.0->autogluon.multimodal==0.8.2->autogluon) (23.2) Requirement already satisfied: pyyaml in /opt/conda/lib/python3.10/site-packages (from accelerate<0.22.0,>=0.21.0->autogluon.multimodal==0.8.2->autogluon) (6.0.1) Requirement already satisfied: botocore<1.35.0,>=1.34.51 in /opt/conda/lib/python3.10/site-packages (from boto3<2,>=1.10->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.34.51) Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from boto3<2,>=1.10->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.0.1) Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /opt/conda/lib/python3.10/site-packages (from boto3<2,>=1.10->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (0.10.1) Requirement already satisfied: graphviz in /opt/conda/lib/python3.10/site-packages (from catboost<1.3,>=1.1->autogluon.tabular[all]==0.8.2->autogluon) (0.8.4) Requirement already satisfied: plotly in /opt/conda/lib/python3.10/site-packages (from catboost<1.3,>=1.1->autogluon.tabular[all]==0.8.2->autogluon) (5.19.0) Requirement already satisfied: six in /opt/conda/lib/python3.10/site-packages (from catboost<1.3,>=1.1->autogluon.tabular[all]==0.8.2->autogluon) (1.16.0) Requirement already satisfied: datasets>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (2.18.0) Requirement already satisfied: dill in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.3.8) Requirement already satisfied: xxhash in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (3.4.1) Requirement already satisfied: multiprocess in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.70.16) Requirement already satisfied: fsspec>=2021.05.0 in /opt/conda/lib/python3.10/site-packages (from fsspec[http]>=2021.05.0->evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (2023.6.0) Requirement already satisfied: huggingface-hub>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.22.2) Requirement already satisfied: responses<0.19 in /opt/conda/lib/python3.10/site-packages (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.18.0) Requirement already satisfied: pip in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (24.0) Requirement already satisfied: fastdownload<2,>=0.0.5 in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.0.7) Requirement already satisfied: fastcore<1.6,>=1.5.29 in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.5.29) Requirement already satisfied: fastprogress>=0.2.4 in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.0.3) Requirement already satisfied: spacy<4 in /opt/conda/lib/python3.10/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (3.7.3) Requirement already satisfied: toolz~=0.10 in /opt/conda/lib/python3.10/site-packages (from gluonts<0.14,>=0.13.1->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.12.1) Requirement already satisfied: typing-extensions~=4.0 in /opt/conda/lib/python3.10/site-packages (from gluonts<0.14,>=0.13.1->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (4.5.0) Requirement already satisfied: future in /opt/conda/lib/python3.10/site-packages (from hyperopt<0.2.8,>=0.2.7->autogluon.core[all]==0.8.2->autogluon) (1.0.0) Requirement already satisfied: cloudpickle in /opt/conda/lib/python3.10/site-packages (from hyperopt<0.2.8,>=0.2.7->autogluon.core[all]==0.8.2->autogluon) (2.2.1) Collecting py4j (from hyperopt<0.2.8,>=0.2.7->autogluon.core[all]==0.8.2->autogluon) Downloading py4j-0.10.9.7-py2.py3-none-any.whl.metadata (1.5 kB) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2<3.2,>=3.0.3->autogluon.multimodal==0.8.2->autogluon) (2.1.5) Requirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.10/site-packages (from jsonschema<4.18,>=4.14->autogluon.multimodal==0.8.2->autogluon) (23.2.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /opt/conda/lib/python3.10/site-packages (from jsonschema<4.18,>=4.14->autogluon.multimodal==0.8.2->autogluon) (0.20.0) Requirement already satisfied: wheel in /opt/conda/lib/python3.10/site-packages (from lightgbm<3.4,>=3.3->autogluon.tabular[all]==0.8.2->autogluon) (0.43.0) Requirement already satisfied: numba in /opt/conda/lib/python3.10/site-packages (from mlforecast<0.7.4,>=0.7.0->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.59.1) Requirement already satisfied: window-ops in /opt/conda/lib/python3.10/site-packages (from mlforecast<0.7.4,>=0.7.0->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.0.15) Requirement already satisfied: gdown>=4.0.0 in /opt/conda/lib/python3.10/site-packages (from nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==0.8.2->autogluon) (5.1.0) Requirement already satisfied: click in /opt/conda/lib/python3.10/site-packages (from nltk<4.0.0,>=3.4.5->autogluon.multimodal==0.8.2->autogluon) (8.1.7) Requirement already satisfied: regex>=2021.8.3 in /opt/conda/lib/python3.10/site-packages (from nltk<4.0.0,>=3.4.5->autogluon.multimodal==0.8.2->autogluon) (2023.12.25) Requirement already satisfied: antlr4-python3-runtime==4.9.* in /opt/conda/lib/python3.10/site-packages (from omegaconf<2.3.0,>=2.1.1->autogluon.multimodal==0.8.2->autogluon) (4.9.3) Requirement already satisfied: colorama in /opt/conda/lib/python3.10/site-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (0.4.6) Requirement already satisfied: model-index in /opt/conda/lib/python3.10/site-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (0.1.11) Requirement already satisfied: rich in /opt/conda/lib/python3.10/site-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (13.7.1) Requirement already satisfied: tabulate in /opt/conda/lib/python3.10/site-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (0.9.0) Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.10/site-packages (from pandas<2.2.0,>=2.0.0->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2.9.0) Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas<2.2.0,>=2.0.0->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2023.3) Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas<2.2.0,>=2.0.0->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2024.1) Requirement already satisfied: lightning-utilities>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from pytorch-lightning<2.1,>=2.0.0->autogluon.multimodal==0.8.2->autogluon) (0.11.2) Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (3.13.4) Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.0.7) Requirement already satisfied: protobuf!=3.19.5,>=3.15.3 in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (4.21.12) Requirement already satisfied: aiosignal in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.3.1) Requirement already satisfied: frozenlist in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.4.1) Requirement already satisfied: grpcio>=1.42.0 in /opt/conda/lib/python3.10/site-packages (from ray<2.7,>=2.6.3->ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.54.3) Requirement already satisfied: aiohttp>=3.7 in /opt/conda/lib/python3.10/site-packages (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (3.9.3) Collecting aiohttp-cors (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading aiohttp_cors-0.7.0-py3-none-any.whl.metadata (20 kB) Collecting colorful (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading colorful-0.5.6-py2.py3-none-any.whl.metadata (16 kB) Collecting py-spy>=0.2.0 (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading py_spy-0.3.14-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (16 kB) Collecting gpustat>=1.0.0 (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading gpustat-1.1.1.tar.gz (98 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.1/98.1 kB 58.0 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting opencensus (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading opencensus-0.11.4-py2.py3-none-any.whl.metadata (12 kB) Requirement already satisfied: prometheus-client>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (0.20.0) Requirement already satisfied: smart-open in /opt/conda/lib/python3.10/site-packages (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (5.2.1) Collecting virtualenv<20.21.1,>=20.0.24 (from ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading virtualenv-20.21.0-py3-none-any.whl.metadata (4.1 kB) Collecting tensorboardX>=1.9 (from ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB) Requirement already satisfied: pyarrow>=6.0.1 in /opt/conda/lib/python3.10/site-packages (from ray[tune]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (12.0.1) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.26.18) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (2024.2.2) Requirement already satisfied: imageio>=2.4.1 in /opt/conda/lib/python3.10/site-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.8.2->autogluon) (2.34.0) Requirement already satisfied: tifffile>=2019.7.26 in /opt/conda/lib/python3.10/site-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.8.2->autogluon) (2024.2.12) Requirement already satisfied: PyWavelets>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.8.2->autogluon) (1.4.1) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from scikit-learn<1.5,>=1.3.0->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.4.0) Requirement already satisfied: patsy>=0.5.4 in /opt/conda/lib/python3.10/site-packages (from statsmodels<0.15,>=0.13.0->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.5.6) Requirement already satisfied: absl-py>=0.4 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (2.1.0) Requirement already satisfied: google-auth<3,>=1.6.3 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (2.29.0) Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (1.0.0) Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (3.6) Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (0.7.0) Requirement already satisfied: werkzeug>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (3.0.2) Requirement already satisfied: safetensors in /opt/conda/lib/python3.10/site-packages (from timm<0.10.0,>=0.9.5->autogluon.multimodal==0.8.2->autogluon) (0.4.2) Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch<2.1,>=1.13->autogluon.multimodal==0.8.2->autogluon) (1.12) Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /opt/conda/lib/python3.10/site-packages (from transformers<4.32.0,>=4.31.0->transformers[sentencepiece]<4.32.0,>=4.31.0->autogluon.multimodal==0.8.2->autogluon) (0.13.3) Requirement already satisfied: sentencepiece!=0.1.92,>=0.1.91 in /opt/conda/lib/python3.10/site-packages (from transformers[sentencepiece]<4.32.0,>=4.31.0->autogluon.multimodal==0.8.2->autogluon) (0.1.99) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.2.1) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (4.51.0) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (1.4.5) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->autogluon.core==0.8.2->autogluon.core[all]==0.8.2->autogluon) (3.1.2) Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (6.0.5) Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (1.9.4) Requirement already satisfied: async-timeout<5.0,>=4.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (4.0.3) Requirement already satisfied: pyarrow-hotfix in /opt/conda/lib/python3.10/site-packages (from datasets>=2.0.0->evaluate<0.5.0,>=0.4.0->autogluon.multimodal==0.8.2->autogluon) (0.6) Requirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.10/site-packages (from gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==0.8.2->autogluon) (4.12.3) Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (5.3.3) Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (0.3.0) Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (4.9) Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (2.0.0) Collecting nvidia-ml-py>=11.450.129 (from gpustat>=1.0.0->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading nvidia_ml_py-12.550.52-py3-none-any.whl.metadata (8.6 kB) Collecting blessed>=1.17.1 (from gpustat>=1.0.0->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading blessed-1.20.0-py2.py3-none-any.whl.metadata (13 kB) Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /opt/conda/lib/python3.10/site-packages (from numba->mlforecast<0.7.4,>=0.7.0->autogluon.timeseries==0.8.2->autogluon.timeseries[all]==0.8.2->autogluon) (0.42.0) Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (3.0.12) Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.0.5) Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.0.10) Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (2.0.8) Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (3.0.9) Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (8.2.2) Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (1.1.2) Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (2.4.8) Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (2.0.10) Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.3.4) Requirement already satisfied: typer<0.10.0,>=0.3.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.9.4) Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /opt/conda/lib/python3.10/site-packages (from spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (3.3.0) Collecting distlib<1,>=0.3.6 (from virtualenv<20.21.1,>=20.0.24->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading distlib-0.3.8-py2.py3-none-any.whl.metadata (5.1 kB) Collecting platformdirs<4,>=2.4 (from virtualenv<20.21.1,>=20.0.24->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading platformdirs-3.11.0-py3-none-any.whl.metadata (11 kB) Requirement already satisfied: ordered-set in /opt/conda/lib/python3.10/site-packages (from model-index->openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (4.1.0) Collecting opencensus-context>=0.1.3 (from opencensus->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading opencensus_context-0.1.3-py2.py3-none-any.whl.metadata (3.3 kB) Collecting google-api-core<3.0.0,>=1.0.0 (from opencensus->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading google_api_core-2.18.0-py3-none-any.whl.metadata (2.7 kB) Requirement already satisfied: tenacity>=6.2.0 in /opt/conda/lib/python3.10/site-packages (from plotly->catboost<1.3,>=1.1->autogluon.tabular[all]==0.8.2->autogluon) (8.2.3) Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.10/site-packages (from rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.10/site-packages (from rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (2.17.2) Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch<2.1,>=1.13->autogluon.multimodal==0.8.2->autogluon) (1.3.0) Requirement already satisfied: wcwidth>=0.1.4 in /opt/conda/lib/python3.10/site-packages (from blessed>=1.17.1->gpustat>=1.0.0->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) (0.2.13) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading googleapis_common_protos-1.63.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]<2.7,>=2.6.3; extra == "all"->autogluon.core[all]==0.8.2->autogluon) Downloading proto_plus-1.23.0-py3-none-any.whl.metadata (2.2 kB) Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==0.8.2->autogluon) (0.1.2) Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /opt/conda/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (0.5.1) Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<3,>=2.9->autogluon.multimodal==0.8.2->autogluon) (3.2.2) Requirement already satisfied: blis<0.8.0,>=0.7.8 in /opt/conda/lib/python3.10/site-packages (from thinc<8.3.0,>=8.2.2->spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.7.10) Requirement already satisfied: confection<1.0.0,>=0.0.1 in /opt/conda/lib/python3.10/site-packages (from thinc<8.3.0,>=8.2.2->spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.1.4) Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from weasel<0.4.0,>=0.1.0->spacy<4->fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.8.2->autogluon) (0.16.0) Requirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.10/site-packages (from beautifulsoup4->gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==0.8.2->autogluon) (2.5) Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /opt/conda/lib/python3.10/site-packages (from requests[socks]->gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==0.8.2->autogluon) (1.7.1) Downloading hyperopt-0.2.7-py2.py3-none-any.whl (1.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 271.6 MB/s eta 0:00:00 Downloading ray-2.6.3-cp310-cp310-manylinux2014_x86_64.whl (56.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.9/56.9 MB 250.5 MB/s eta 0:00:00a 0:00:01 Downloading py_spy-0.3.14-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 355.4 MB/s eta 0:00:00 Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.7/101.7 kB 349.3 MB/s eta 0:00:00 Downloading virtualenv-20.21.0-py3-none-any.whl (8.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.7/8.7 MB 220.8 MB/s eta 0:00:00a 0:00:01 Downloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB) Downloading colorful-0.5.6-py2.py3-none-any.whl (201 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.4/201.4 kB 62.2 MB/s eta 0:00:00 Downloading opencensus-0.11.4-py2.py3-none-any.whl (128 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 128.2/128.2 kB 319.5 MB/s eta 0:00:00 Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.5/200.5 kB 270.8 MB/s eta 0:00:00 Downloading blessed-1.20.0-py2.py3-none-any.whl (58 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.4/58.4 kB 186.2 MB/s eta 0:00:00 Downloading distlib-0.3.8-py2.py3-none-any.whl (468 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.9/468.9 kB 361.8 MB/s eta 0:00:00 Downloading google_api_core-2.18.0-py3-none-any.whl (138 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 138.3/138.3 kB 286.0 MB/s eta 0:00:00 Downloading nvidia_ml_py-12.550.52-py3-none-any.whl (39 kB) Downloading opencensus_context-0.1.3-py2.py3-none-any.whl (5.1 kB) Downloading platformdirs-3.11.0-py3-none-any.whl (17 kB) Downloading googleapis_common_protos-1.63.0-py2.py3-none-any.whl (229 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.1/229.1 kB 385.9 MB/s eta 0:00:00 Downloading proto_plus-1.23.0-py3-none-any.whl (48 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.8/48.8 kB 263.2 MB/s eta 0:00:00 Building wheels for collected packages: gpustat Building wheel for gpustat (pyproject.toml) ... done Created wheel for gpustat: filename=gpustat-1.1.1-py3-none-any.whl size=26532 sha256=e012e914502463f607973af726715ec5ee30e5c48da07986af8312ce2322c70e Stored in directory: /tmp/pip-ephem-wheel-cache-ut2h164v/wheels/ec/d7/80/a71ba3540900e1f276bcae685efd8e590c810d2108b95f1e47 Successfully built gpustat Installing collected packages: py4j, py-spy, opencensus-context, nvidia-ml-py, distlib, colorful, tensorboardX, proto-plus, platformdirs, googleapis-common-protos, blessed, virtualenv, ray, hyperopt, gpustat, google-api-core, aiohttp-cors, opencensus Attempting uninstall: platformdirs Found existing installation: platformdirs 4.2.0 Uninstalling platformdirs-4.2.0: Successfully uninstalled platformdirs-4.2.0 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. sparkmagic 0.21.0 requires pandas<2.0.0,>=0.17.1, but you have pandas 2.1.4 which is incompatible. Successfully installed aiohttp-cors-0.7.0 blessed-1.20.0 colorful-0.5.6 distlib-0.3.8 google-api-core-2.18.0 googleapis-common-protos-1.63.0 gpustat-1.1.1 hyperopt-0.2.7 nvidia-ml-py-12.550.52 opencensus-0.11.4 opencensus-context-0.1.3 platformdirs-3.11.0 proto-plus-1.23.0 py-spy-0.3.14 py4j-0.10.9.7 ray-2.6.3 tensorboardX-2.6.2.2 virtualenv-20.21.0
Setup Kaggle API Key¶
# create the .kaggle directory and an empty kaggle.json file
!pip install -q Kaggle
# create a kaggle directory
#!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json
mkdir: cannot create directory ‘/root’: Permission denied touch: cannot touch '/root/.kaggle/kaggle.json': Permission denied chmod: cannot access '/root/.kaggle/kaggle.json': Permission denied
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "adityachauhan2606"
kaggle_key = "1b106ec3474e78e7c1055c517197d794"
# Save API token the kaggle.json file
with open("/root/.kaggle/kaggle.json", "w") as f:
f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))
--------------------------------------------------------------------------- PermissionError Traceback (most recent call last) Cell In[26], line 7 4 kaggle_key = "168bff5465eea0f1eb66bebe761b4dec" 6 # Save API token the kaggle.json file ----> 7 with open("/root/.kaggle/kaggle.json", "w") as f: 8 f.write(json.dumps({"username": kaggle_username, "key": kaggle_key})) File /opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs) 317 if file in {0, 1, 2}: 318 raise ValueError( 319 f"IPython won't let you open fd={file} by default " 320 "as it is likely to crash IPython. If you know what you are doing, " 321 "you can use builtins' open." 322 ) --> 324 return io_open(file, *args, **kwargs) PermissionError: [Errno 13] Permission denied: '/root/.kaggle/kaggle.json'
Download and explore dataset¶
Go to the bike sharing demand competition and agree to the terms¶
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle competitions download -c bike-sharing-demand
# If you already downloaded it you can use the -o command to overwrite the file
!unzip -o bike-sharing-demand.zip
Downloading bike-sharing-demand.zip to /home/sagemaker-user/cd0385-project-starter/project 0%| | 0.00/189k [00:00<?, ?B/s] 100%|████████████████████████████████████████| 189k/189k [00:00<00:00, 54.4MB/s] Archive: bike-sharing-demand.zip inflating: sampleSubmission.csv inflating: test.csv inflating: train.csv
import pandas as pd
from autogluon.tabular import TabularPredictor
# Create the train dataset in pandas by reading the csv
# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
train = pd.read_csv('train.csv',parse_dates=["datetime"])
train.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 3 | 13 | 16 |
| 1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 8 | 32 | 40 |
| 2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 5 | 27 | 32 |
| 3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 3 | 10 | 13 |
| 4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 0 | 1 | 1 |
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
train.describe()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 10886 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.00000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 |
| mean | 2011-12-27 05:56:22.399411968 | 2.506614 | 0.028569 | 0.680875 | 1.418427 | 20.23086 | 23.655084 | 61.886460 | 12.799395 | 36.021955 | 155.552177 | 191.574132 |
| min | 2011-01-01 00:00:00 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.82000 | 0.760000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| 25% | 2011-07-02 07:15:00 | 2.000000 | 0.000000 | 0.000000 | 1.000000 | 13.94000 | 16.665000 | 47.000000 | 7.001500 | 4.000000 | 36.000000 | 42.000000 |
| 50% | 2012-01-01 20:30:00 | 3.000000 | 0.000000 | 1.000000 | 1.000000 | 20.50000 | 24.240000 | 62.000000 | 12.998000 | 17.000000 | 118.000000 | 145.000000 |
| 75% | 2012-07-01 12:45:00 | 4.000000 | 0.000000 | 1.000000 | 2.000000 | 26.24000 | 31.060000 | 77.000000 | 16.997900 | 49.000000 | 222.000000 | 284.000000 |
| max | 2012-12-19 23:00:00 | 4.000000 | 1.000000 | 1.000000 | 4.000000 | 41.00000 | 45.455000 | 100.000000 | 56.996900 | 367.000000 | 886.000000 | 977.000000 |
| std | NaN | 1.116174 | 0.166599 | 0.466159 | 0.633839 | 7.79159 | 8.474601 | 19.245033 | 8.164537 | 49.960477 | 151.039033 | 181.144454 |
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
test = pd.read_csv('test.csv',parse_dates=["datetime"])
test.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-20 00:00:00 | 1 | 0 | 1 | 1 | 10.66 | 11.365 | 56 | 26.0027 |
| 1 | 2011-01-20 01:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 |
| 2 | 2011-01-20 02:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 |
| 3 | 2011-01-20 03:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 |
| 4 | 2011-01-20 04:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 |
# Same thing as train and test dataset
submission = pd.read_csv('sampleSubmission.csv',parse_dates=["datetime"])
submission.head()
| datetime | count | |
|---|---|---|
| 0 | 2011-01-20 00:00:00 | 0 |
| 1 | 2011-01-20 01:00:00 | 0 |
| 2 | 2011-01-20 02:00:00 | 0 |
| 3 | 2011-01-20 03:00:00 | 0 |
| 4 | 2011-01-20 04:00:00 | 0 |
Step 3: Train a model using AutoGluon’s Tabular Prediction¶
Requirements:
- We are prediting
count, so it is the label we are setting. - Ignore
casualandregisteredcolumns as they are also not present in the test dataset. - Use the
root_mean_squared_erroras the metric to use for evaluation. - Set a time limit of 10 minutes (600 seconds).
- Use the preset
best_qualityto focus on creating the best model.
ignored_columns = ["casual", "registered"]
predictor = TabularPredictor(
label='count',
problem_type="regression",
eval_metric='root_mean_squared_error',
learner_kwargs={'ignored_columns': ignored_columns}
).fit(train_data=train, time_limit=600, presets='best_quality')
No path specified. Models will be saved in: "AutogluonModels/ag-20240430_152258"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20240430_152258"
AutoGluon Version: 0.8.2
Python Version: 3.10.14
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Sat Mar 23 09:49:55 UTC 2024
Disk Space Avail: 3.78 GB / 5.36 GB (70.5%)
WARNING: Available disk space is low and there is a risk that AutoGluon will run out of disk during fit, causing an exception.
We recommend a minimum available disk space of 10 GB, and large datasets may require more.
Train Data Rows: 10886
Train Data Columns: 11
Label Column: count
Preprocessing data ...
/opt/conda/lib/python3.10/site-packages/autogluon/tabular/learner/default_learner.py:215: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context("mode.use_inf_as_na", True): # treat None, NaN, INF, NINF as NA
Using Feature Generators to preprocess the data ...
Dropping user-specified ignored columns: ['casual', 'registered']
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 1928.75 MB
Train Data (Original) Memory Usage: 0.78 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 2 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting DatetimeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('datetime', []) : 1 | ['datetime']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 5 | ['season', 'holiday', 'workingday', 'weather', 'humidity']
Types of features in processed data (raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 3 | ['season', 'weather', 'humidity']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
0.1s = Fit runtime
9 features in original data used to generate 13 features in processed data.
Train Data (Processed) Memory Usage: 0.98 MB (0.1% of available memory)
Data preprocessing and feature engineering runtime = 0.16s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
'NN_TORCH': {},
'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
'CAT': {},
'XGB': {},
'FASTAI': {},
'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.79s of the 599.84s of remaining time.
-101.5462 = Validation score (-root_mean_squared_error)
0.05s = Training runtime
0.06s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.64s of the 599.68s of remaining time.
-84.1251 = Validation score (-root_mean_squared_error)
0.05s = Training runtime
0.05s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 399.49s of the 599.54s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 131.684 [2000] valid_set's rmse: 130.67 [3000] valid_set's rmse: 130.626 [1000] valid_set's rmse: 135.592 [1000] valid_set's rmse: 133.481 [2000] valid_set's rmse: 132.323 [3000] valid_set's rmse: 131.618 [4000] valid_set's rmse: 131.443 [5000] valid_set's rmse: 131.265 [6000] valid_set's rmse: 131.277 [7000] valid_set's rmse: 131.443 [1000] valid_set's rmse: 128.503 [2000] valid_set's rmse: 127.654 [3000] valid_set's rmse: 127.227 [4000] valid_set's rmse: 127.105 [1000] valid_set's rmse: 134.135 [2000] valid_set's rmse: 132.272 [3000] valid_set's rmse: 131.286 [4000] valid_set's rmse: 130.752 [5000] valid_set's rmse: 130.363 [6000] valid_set's rmse: 130.509 [1000] valid_set's rmse: 136.168 [2000] valid_set's rmse: 135.138 [3000] valid_set's rmse: 135.029 [1000] valid_set's rmse: 134.061 [2000] valid_set's rmse: 133.034 [3000] valid_set's rmse: 132.182 [4000] valid_set's rmse: 131.997 [5000] valid_set's rmse: 131.643 [6000] valid_set's rmse: 131.504 [7000] valid_set's rmse: 131.574 [1000] valid_set's rmse: 132.912 [2000] valid_set's rmse: 131.703 [3000] valid_set's rmse: 131.117 [4000] valid_set's rmse: 130.82 [5000] valid_set's rmse: 130.673 [6000] valid_set's rmse: 130.708
-131.4609 = Validation score (-root_mean_squared_error) 57.25s = Training runtime 9.6s = Validation runtime Fitting model: LightGBM_BAG_L1 ... Training model for up to 325.45s of the 525.49s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 130.818 [1000] valid_set's rmse: 133.204 [1000] valid_set's rmse: 130.928 [1000] valid_set's rmse: 126.846 [1000] valid_set's rmse: 131.426 [1000] valid_set's rmse: 133.655 [1000] valid_set's rmse: 132.155 [1000] valid_set's rmse: 130.62
-131.0542 = Validation score (-root_mean_squared_error) 17.01s = Training runtime 1.39s = Validation runtime Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 304.91s of the 504.95s of remaining time. -116.5484 = Validation score (-root_mean_squared_error) 16.17s = Training runtime 0.85s = Validation runtime Fitting model: CatBoost_BAG_L1 ... Training model for up to 287.19s of the 487.23s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 4125. Ran out of time, early stopping on iteration 4255. Ran out of time, early stopping on iteration 4104. Ran out of time, early stopping on iteration 4438. Ran out of time, early stopping on iteration 4498. -130.5806 = Validation score (-root_mean_squared_error) 237.03s = Training runtime 0.09s = Validation runtime Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 49.96s of the 250.0s of remaining time. -124.6007 = Validation score (-root_mean_squared_error) 8.3s = Training runtime 0.68s = Validation runtime Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 40.49s of the 240.54s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, stopping training early. (Stopping on epoch 6) Ran out of time, stopping training early. (Stopping on epoch 5) Ran out of time, stopping training early. (Stopping on epoch 4) Ran out of time, stopping training early. (Stopping on epoch 9) Ran out of time, stopping training early. (Stopping on epoch 15) -140.0803 = Validation score (-root_mean_squared_error) 38.33s = Training runtime 0.32s = Validation runtime Fitting model: XGBoost_BAG_L1 ... Training model for up to 1.68s of the 201.73s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Time limit exceeded... Skipping XGBoost_BAG_L1. Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 1.44s of the 201.48s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Time limit exceeded... Skipping NeuralNetTorch_BAG_L1. Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 1.27s of the 201.31s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 179.334 Time limit exceeded... Skipping LightGBMLarge_BAG_L1. Completed 1/20 k-fold bagging repeats ... Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 200.63s of remaining time. -84.1251 = Validation score (-root_mean_squared_error) 0.59s = Training runtime 0.0s = Validation runtime Fitting 9 L2 models ... Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 200.01s of the 200.0s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 60.6212 [2000] valid_set's rmse: 60.0139 [1000] valid_set's rmse: 60.8505 [2000] valid_set's rmse: 59.7802 [1000] valid_set's rmse: 63.5014 [2000] valid_set's rmse: 62.3981 [1000] valid_set's rmse: 64.3139 [2000] valid_set's rmse: 62.4806 [1000] valid_set's rmse: 58.8796 [2000] valid_set's rmse: 57.875 [1000] valid_set's rmse: 63.3716 [2000] valid_set's rmse: 62.1822 [1000] valid_set's rmse: 63.2193 [2000] valid_set's rmse: 62.0194 [1000] valid_set's rmse: 58.3153
-60.5181 = Validation score (-root_mean_squared_error)
47.83s = Training runtime
3.84s = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 144.37s of the 144.35s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-55.1358 = Validation score (-root_mean_squared_error)
13.84s = Training runtime
0.23s = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 129.97s of the 129.96s of remaining time.
-53.32 = Validation score (-root_mean_squared_error)
40.99s = Training runtime
1.11s = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 87.2s of the 87.19s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, early stopping on iteration 1212.
Ran out of time, early stopping on iteration 1311.
Ran out of time, early stopping on iteration 1429.
Ran out of time, early stopping on iteration 1165.
Ran out of time, early stopping on iteration 1369.
Ran out of time, early stopping on iteration 1519.
Ran out of time, early stopping on iteration 1647.
-55.2556 = Validation score (-root_mean_squared_error)
80.04s = Training runtime
0.05s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 7.03s of the 7.02s of remaining time.
-53.7902 = Validation score (-root_mean_squared_error)
15.3s = Training runtime
0.89s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -9.82s of remaining time.
-52.7696 = Validation score (-root_mean_squared_error)
0.36s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 610.22s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240430_152258")
Review AutoGluon's training run with ranking of models that did the best.¶
predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -52.769599 15.322824 524.710763 0.000698 0.364609 3 True 15
1 RandomForestMSE_BAG_L2 -53.320041 14.154553 415.166024 1.110522 40.986593 2 True 12
2 ExtraTreesMSE_BAG_L2 -53.790163 13.933192 389.483496 0.889161 15.304065 2 True 14
3 LightGBM_BAG_L2 -55.135772 13.270000 388.016868 0.225970 13.837437 2 True 11
4 CatBoost_BAG_L2 -55.255559 13.096474 454.218059 0.052443 80.038627 2 True 13
5 LightGBMXT_BAG_L2 -60.518056 16.884971 422.006862 3.840941 47.827430 2 True 10
6 KNeighborsDist_BAG_L1 -84.125061 0.049080 0.054023 0.049080 0.054023 1 True 2
7 WeightedEnsemble_L2 -84.125061 0.049655 0.647780 0.000575 0.593757 2 True 9
8 KNeighborsUnif_BAG_L1 -101.546199 0.063212 0.045053 0.063212 0.045053 1 True 1
9 RandomForestMSE_BAG_L1 -116.548359 0.849170 16.166970 0.849170 16.166970 1 True 5
10 ExtraTreesMSE_BAG_L1 -124.600676 0.679650 8.297273 0.679650 8.297273 1 True 7
11 CatBoost_BAG_L1 -130.580587 0.086503 237.025483 0.086503 237.025483 1 True 6
12 LightGBM_BAG_L1 -131.054162 1.393983 17.014229 1.393983 17.014229 1 True 4
13 LightGBMXT_BAG_L1 -131.460909 9.598674 57.249527 9.598674 57.249527 1 True 3
14 NeuralNetFastAI_BAG_L1 -140.080292 0.323758 38.326873 0.323758 38.326873 1 True 8
Number of models trained: 15
Types of models trained:
{'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_RF', 'WeightedEnsembleModel', 'StackerEnsembleModel_XT'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 3 | ['season', 'weather', 'humidity']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
*** End of fit() summary ***
/opt/conda/lib/python3.10/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
'KNeighborsDist_BAG_L1': -84.12506123181602,
'LightGBMXT_BAG_L1': -131.46090891834504,
'LightGBM_BAG_L1': -131.054161598899,
'RandomForestMSE_BAG_L1': -116.54835939455667,
'CatBoost_BAG_L1': -130.58058710604206,
'ExtraTreesMSE_BAG_L1': -124.60067564699747,
'NeuralNetFastAI_BAG_L1': -140.08029174378652,
'WeightedEnsemble_L2': -84.12506123181602,
'LightGBMXT_BAG_L2': -60.51805619636211,
'LightGBM_BAG_L2': -55.135771877586556,
'RandomForestMSE_BAG_L2': -53.320040985958315,
'CatBoost_BAG_L2': -55.25555940124764,
'ExtraTreesMSE_BAG_L2': -53.79016284992284,
'WeightedEnsemble_L3': -52.76959939021615},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': ['KNeighborsUnif_BAG_L1'],
'KNeighborsDist_BAG_L1': ['KNeighborsDist_BAG_L1'],
'LightGBMXT_BAG_L1': ['LightGBMXT_BAG_L1'],
'LightGBM_BAG_L1': ['LightGBM_BAG_L1'],
'RandomForestMSE_BAG_L1': ['RandomForestMSE_BAG_L1'],
'CatBoost_BAG_L1': ['CatBoost_BAG_L1'],
'ExtraTreesMSE_BAG_L1': ['ExtraTreesMSE_BAG_L1'],
'NeuralNetFastAI_BAG_L1': ['NeuralNetFastAI_BAG_L1'],
'WeightedEnsemble_L2': ['WeightedEnsemble_L2'],
'LightGBMXT_BAG_L2': ['LightGBMXT_BAG_L2'],
'LightGBM_BAG_L2': ['LightGBM_BAG_L2'],
'RandomForestMSE_BAG_L2': ['RandomForestMSE_BAG_L2'],
'CatBoost_BAG_L2': ['CatBoost_BAG_L2'],
'ExtraTreesMSE_BAG_L2': ['ExtraTreesMSE_BAG_L2'],
'WeightedEnsemble_L3': ['WeightedEnsemble_L3']},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.04505300521850586,
'KNeighborsDist_BAG_L1': 0.05402326583862305,
'LightGBMXT_BAG_L1': 57.24952697753906,
'LightGBM_BAG_L1': 17.014228582382202,
'RandomForestMSE_BAG_L1': 16.166969537734985,
'CatBoost_BAG_L1': 237.02548336982727,
'ExtraTreesMSE_BAG_L1': 8.297273397445679,
'NeuralNetFastAI_BAG_L1': 38.32687330245972,
'WeightedEnsemble_L2': 0.593756914138794,
'LightGBMXT_BAG_L2': 47.8274302482605,
'LightGBM_BAG_L2': 13.83743691444397,
'RandomForestMSE_BAG_L2': 40.98659300804138,
'CatBoost_BAG_L2': 80.03862738609314,
'ExtraTreesMSE_BAG_L2': 15.304064750671387,
'WeightedEnsemble_L3': 0.3646094799041748},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.06321167945861816,
'KNeighborsDist_BAG_L1': 0.04908013343811035,
'LightGBMXT_BAG_L1': 9.598673820495605,
'LightGBM_BAG_L1': 1.3939833641052246,
'RandomForestMSE_BAG_L1': 0.8491702079772949,
'CatBoost_BAG_L1': 0.0865027904510498,
'ExtraTreesMSE_BAG_L1': 0.6796503067016602,
'NeuralNetFastAI_BAG_L1': 0.3237583637237549,
'WeightedEnsemble_L2': 0.0005748271942138672,
'LightGBMXT_BAG_L2': 3.8409407138824463,
'LightGBM_BAG_L2': 0.22596955299377441,
'RandomForestMSE_BAG_L2': 1.1105222702026367,
'CatBoost_BAG_L2': 0.05244302749633789,
'ExtraTreesMSE_BAG_L2': 0.8891608715057373,
'WeightedEnsemble_L3': 0.000698089599609375},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -52.769599 15.322824 524.710763
1 RandomForestMSE_BAG_L2 -53.320041 14.154553 415.166024
2 ExtraTreesMSE_BAG_L2 -53.790163 13.933192 389.483496
3 LightGBM_BAG_L2 -55.135772 13.270000 388.016868
4 CatBoost_BAG_L2 -55.255559 13.096474 454.218059
5 LightGBMXT_BAG_L2 -60.518056 16.884971 422.006862
6 KNeighborsDist_BAG_L1 -84.125061 0.049080 0.054023
7 WeightedEnsemble_L2 -84.125061 0.049655 0.647780
8 KNeighborsUnif_BAG_L1 -101.546199 0.063212 0.045053
9 RandomForestMSE_BAG_L1 -116.548359 0.849170 16.166970
10 ExtraTreesMSE_BAG_L1 -124.600676 0.679650 8.297273
11 CatBoost_BAG_L1 -130.580587 0.086503 237.025483
12 LightGBM_BAG_L1 -131.054162 1.393983 17.014229
13 LightGBMXT_BAG_L1 -131.460909 9.598674 57.249527
14 NeuralNetFastAI_BAG_L1 -140.080292 0.323758 38.326873
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000698 0.364609 3 True
1 1.110522 40.986593 2 True
2 0.889161 15.304065 2 True
3 0.225970 13.837437 2 True
4 0.052443 80.038627 2 True
5 3.840941 47.827430 2 True
6 0.049080 0.054023 1 True
7 0.000575 0.593757 2 True
8 0.063212 0.045053 1 True
9 0.849170 16.166970 1 True
10 0.679650 8.297273 1 True
11 0.086503 237.025483 1 True
12 1.393983 17.014229 1 True
13 9.598674 57.249527 1 True
14 0.323758 38.326873 1 True
fit_order
0 15
1 12
2 14
3 11
4 13
5 10
6 2
7 9
8 1
9 5
10 7
11 6
12 4
13 3
14 8 }
Create predictions from test dataset¶
predictions = predictor.predict(test)
predictions = {'datetime': test['datetime'], 'Pred_count': predictions}
predictions = pd.DataFrame(data=predictions)
predictions.head()
| datetime | Pred_count | |
|---|---|---|
| 0 | 2011-01-20 00:00:00 | 23.629269 |
| 1 | 2011-01-20 01:00:00 | 41.970566 |
| 2 | 2011-01-20 02:00:00 | 46.314308 |
| 3 | 2011-01-20 03:00:00 | 49.542381 |
| 4 | 2011-01-20 04:00:00 | 52.041100 |
NOTE: Kaggle will reject the submission if we don't set everything to be > 0.¶
# Describe the `predictions` series to see if there are any negative values
predictions.describe()
| datetime | Pred_count | |
|---|---|---|
| count | 6493 | 6493.000000 |
| mean | 2012-01-13 09:27:47.765285632 | 101.197556 |
| min | 2011-01-20 00:00:00 | 2.784252 |
| 25% | 2011-07-22 15:00:00 | 20.890858 |
| 50% | 2012-01-20 23:00:00 | 64.331596 |
| 75% | 2012-07-20 17:00:00 | 169.635635 |
| max | 2012-12-31 23:00:00 | 362.269684 |
| std | NaN | 90.369186 |
# How many negative values do we have?
def calNeg(val):
return val[val < 0].sum()
NegV = predictions.groupby(predictions['Pred_count'])
re = NegV['Pred_count'].agg([('No.of Negative values', calNeg)])
print(re)
No.of Negative values Pred_count 2.784252 0.0 2.816588 0.0 2.818243 0.0 2.928141 0.0 3.029108 0.0 ... ... 361.349609 0.0 361.419434 0.0 361.767517 0.0 361.864990 0.0 362.269684 0.0 [6270 rows x 1 columns]
# Set them to zero
# Min = 0
Set predictions to submission dataframe, save, and submit¶
submission["count"] = predictions['Pred_count']
submission.to_csv("submission.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "first raw submission"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 571kB/s] Successfully submitted to Bike Sharing Demand
View submission via the command line or in the web browser under the competition's page - My Submissions¶
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName date description status publicScore privateScore -------------- ------------------- -------------------- -------- ----------- ------------ submission.csv 2024-04-30 15:59:33 first raw submission complete 1.79816 1.79816
Initial score of 1.79816¶
Step 4: Exploratory Data Analysis and Creating an additional feature¶
- Any additional feature will do, but a great suggestion would be to separate out the datetime into hour, day, or month parts.
# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
train.hist(figsize=(15,15))
array([[<Axes: title={'center': 'datetime'}>,
<Axes: title={'center': 'season'}>,
<Axes: title={'center': 'holiday'}>],
[<Axes: title={'center': 'workingday'}>,
<Axes: title={'center': 'weather'}>,
<Axes: title={'center': 'temp'}>],
[<Axes: title={'center': 'atemp'}>,
<Axes: title={'center': 'humidity'}>,
<Axes: title={'center': 'windspeed'}>],
[<Axes: title={'center': 'casual'}>,
<Axes: title={'center': 'registered'}>,
<Axes: title={'center': 'count'}>]], dtype=object)
import matplotlib.pyplot as plt
import seaborn as sns
corrD = train.copy()
corr_map = corrD.drop(columns=['datetime']).corr()
fig, ax = plt.subplots(figsize = (15,15))
sns.heatmap(corr_map, square = True, annot = True, cmap = 'coolwarm', ax = ax, cbar_kws = {'shrink': 0.8})
ax.set_title('Correlation Between Numerical Variable')
# create a new feature
# Train
train["year"] = train["datetime"].dt.year
train["month"] = train["datetime"].dt.month
train["day"] = train["datetime"].dt.dayofweek
train["hour"] = train["datetime"].dt.hour
# Drop datetime
train.drop(["datetime"], axis=1, inplace=True)
train.head()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | year | month | day | hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 3 | 13 | 16 | 2011 | 1 | 5 | 0 |
| 1 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 8 | 32 | 40 | 2011 | 1 | 5 | 1 |
| 2 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 5 | 27 | 32 | 2011 | 1 | 5 | 2 |
| 3 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 3 | 10 | 13 | 2011 | 1 | 5 | 3 |
| 4 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 0 | 1 | 1 | 2011 | 1 | 5 | 4 |
# Test
test["year"] = test["datetime"].dt.year
test["month"] = test["datetime"].dt.month
test["day"] = test["datetime"].dt.dayofweek
test["hour"] = test["datetime"].dt.hour
# Drop datetime
test.drop(["datetime"], axis=1, inplace=True)
test.head()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | year | month | day | hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 1 | 1 | 10.66 | 11.365 | 56 | 26.0027 | 2011 | 1 | 3 | 0 |
| 1 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 | 2011 | 1 | 3 | 1 |
| 2 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 | 2011 | 1 | 3 | 2 |
| 3 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 | 2011 | 1 | 3 | 3 |
| 4 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 | 2011 | 1 | 3 | 4 |
Make category types for these so models know they are not just numbers¶
- AutoGluon originally sees these as ints, but in reality they are int representations of a category.
- Setting the dtype to category will classify these as categories in AutoGluon.
train["season"] = train["season"].astype("category")
train["weather"] = train["weather"].astype("category")
test["season"] = test["season"].astype("category")
test["weather"] = test["weather"].astype("category")
train.info(), test.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10886 entries, 0 to 10885 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 season 10886 non-null category 1 holiday 10886 non-null int64 2 workingday 10886 non-null int64 3 weather 10886 non-null category 4 temp 10886 non-null float64 5 atemp 10886 non-null float64 6 humidity 10886 non-null int64 7 windspeed 10886 non-null float64 8 casual 10886 non-null int64 9 registered 10886 non-null int64 10 count 10886 non-null int64 11 year 10886 non-null int32 12 month 10886 non-null int32 13 day 10886 non-null int32 14 hour 10886 non-null int32 dtypes: category(2), float64(3), int32(4), int64(6) memory usage: 957.3 KB <class 'pandas.core.frame.DataFrame'> RangeIndex: 6493 entries, 0 to 6492 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 season 6493 non-null category 1 holiday 6493 non-null int64 2 workingday 6493 non-null int64 3 weather 6493 non-null category 4 temp 6493 non-null float64 5 atemp 6493 non-null float64 6 humidity 6493 non-null int64 7 windspeed 6493 non-null float64 8 year 6493 non-null int32 9 month 6493 non-null int32 10 day 6493 non-null int32 11 hour 6493 non-null int32 dtypes: category(2), float64(3), int32(4), int64(3) memory usage: 419.0 KB
(None, None)
# View are new feature
train.head(10)
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | year | month | day | hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0000 | 3 | 13 | 16 | 2011 | 1 | 5 | 0 |
| 1 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0000 | 8 | 32 | 40 | 2011 | 1 | 5 | 1 |
| 2 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0000 | 5 | 27 | 32 | 2011 | 1 | 5 | 2 |
| 3 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0000 | 3 | 10 | 13 | 2011 | 1 | 5 | 3 |
| 4 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0000 | 0 | 1 | 1 | 2011 | 1 | 5 | 4 |
| 5 | 1 | 0 | 0 | 2 | 9.84 | 12.880 | 75 | 6.0032 | 0 | 1 | 1 | 2011 | 1 | 5 | 5 |
| 6 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0000 | 2 | 0 | 2 | 2011 | 1 | 5 | 6 |
| 7 | 1 | 0 | 0 | 1 | 8.20 | 12.880 | 86 | 0.0000 | 1 | 2 | 3 | 2011 | 1 | 5 | 7 |
| 8 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0000 | 1 | 7 | 8 | 2011 | 1 | 5 | 8 |
| 9 | 1 | 0 | 0 | 1 | 13.12 | 17.425 | 76 | 0.0000 | 8 | 6 | 14 | 2011 | 1 | 5 | 9 |
# View histogram of all features again now with the hour feature
train.hist(figsize=(15,15))
array([[<Axes: title={'center': 'holiday'}>,
<Axes: title={'center': 'workingday'}>,
<Axes: title={'center': 'temp'}>,
<Axes: title={'center': 'atemp'}>],
[<Axes: title={'center': 'humidity'}>,
<Axes: title={'center': 'windspeed'}>,
<Axes: title={'center': 'casual'}>,
<Axes: title={'center': 'registered'}>],
[<Axes: title={'center': 'count'}>,
<Axes: title={'center': 'year'}>,
<Axes: title={'center': 'month'}>,
<Axes: title={'center': 'day'}>],
[<Axes: title={'center': 'hour'}>, <Axes: >, <Axes: >, <Axes: >]],
dtype=object)
Step 5: Rerun the model with the same settings as before, just with more features¶
ignored_columns = ["casual", "registered"]
predictor_new_features = TabularPredictor(
label='count',
problem_type="regression",
eval_metric='root_mean_squared_error',
learner_kwargs={'ignored_columns': ignored_columns}
).fit(train_data=train, time_limit=600, presets='best_quality')
No path specified. Models will be saved in: "AutogluonModels/ag-20240430_170549"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20240430_170549"
AutoGluon Version: 0.8.2
Python Version: 3.10.14
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Sat Mar 23 09:49:55 UTC 2024
Disk Space Avail: 1.10 GB / 5.36 GB (20.4%)
WARNING: Available disk space is low and there is a risk that AutoGluon will run out of disk during fit, causing an exception.
We recommend a minimum available disk space of 10 GB, and large datasets may require more.
Train Data Rows: 10886
Train Data Columns: 14
Label Column: count
Preprocessing data ...
/opt/conda/lib/python3.10/site-packages/autogluon/tabular/learner/default_learner.py:215: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context("mode.use_inf_as_na", True): # treat None, NaN, INF, NINF as NA
Using Feature Generators to preprocess the data ...
Dropping user-specified ignored columns: ['casual', 'registered']
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 1680.57 MB
Train Data (Original) Memory Usage: 0.72 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 7 | ['holiday', 'workingday', 'humidity', 'year', 'month', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
0.1s = Fit runtime
12 features in original data used to generate 12 features in processed data.
Train Data (Processed) Memory Usage: 0.53 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.13s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
'NN_TORCH': {},
'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
'CAT': {},
'XGB': {},
'FASTAI': {},
'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.81s of the 599.87s of remaining time.
-115.7332 = Validation score (-root_mean_squared_error)
0.03s = Training runtime
0.14s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.61s of the 599.66s of remaining time.
-112.1571 = Validation score (-root_mean_squared_error)
0.03s = Training runtime
0.21s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 399.33s of the 599.38s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 37.3955 [2000] valid_set's rmse: 35.8564 [3000] valid_set's rmse: 35.7733 [4000] valid_set's rmse: 35.749 [1000] valid_set's rmse: 38.4092 [2000] valid_set's rmse: 36.984 [3000] valid_set's rmse: 36.7048 [4000] valid_set's rmse: 36.6577 [5000] valid_set's rmse: 36.6682 [6000] valid_set's rmse: 36.6427 [1000] valid_set's rmse: 36.9097 [2000] valid_set's rmse: 35.5912 [3000] valid_set's rmse: 35.1505 [4000] valid_set's rmse: 34.9993 [5000] valid_set's rmse: 34.869 [6000] valid_set's rmse: 34.8566 [7000] valid_set's rmse: 34.8204 [8000] valid_set's rmse: 34.7883 [9000] valid_set's rmse: 34.7902 [10000] valid_set's rmse: 34.8132 [1000] valid_set's rmse: 38.5003 [2000] valid_set's rmse: 37.0041 [3000] valid_set's rmse: 36.7718 [4000] valid_set's rmse: 36.7333 [5000] valid_set's rmse: 36.7654 [1000] valid_set's rmse: 40.4421 [2000] valid_set's rmse: 38.8755 [3000] valid_set's rmse: 38.3805 [4000] valid_set's rmse: 38.1652 [5000] valid_set's rmse: 38.0954 [6000] valid_set's rmse: 38.042 [7000] valid_set's rmse: 38.027 [8000] valid_set's rmse: 38.0432 [1000] valid_set's rmse: 38.0702 [2000] valid_set's rmse: 35.7573 [3000] valid_set's rmse: 35.2602 [4000] valid_set's rmse: 35.0557 [5000] valid_set's rmse: 34.9124 [6000] valid_set's rmse: 34.8075 [7000] valid_set's rmse: 34.7336 [8000] valid_set's rmse: 34.757 [9000] valid_set's rmse: 34.823 [1000] valid_set's rmse: 40.6532 [2000] valid_set's rmse: 40.1092 [3000] valid_set's rmse: 39.9361 [4000] valid_set's rmse: 39.9075 [5000] valid_set's rmse: 39.8418 [6000] valid_set's rmse: 39.9598 [1000] valid_set's rmse: 37.1489 [2000] valid_set's rmse: 35.4784 [3000] valid_set's rmse: 35.2126 [4000] valid_set's rmse: 35.1509
-36.4599 = Validation score (-root_mean_squared_error) 71.66s = Training runtime 16.28s = Validation runtime Fitting model: LightGBM_BAG_L1 ... Training model for up to 302.02s of the 502.07s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 35.0742 [1000] valid_set's rmse: 34.1338 [2000] valid_set's rmse: 33.9294 [1000] valid_set's rmse: 34.257 [2000] valid_set's rmse: 33.6373 [3000] valid_set's rmse: 33.4395 [4000] valid_set's rmse: 33.4325 [1000] valid_set's rmse: 37.3575 [2000] valid_set's rmse: 37.1945 [1000] valid_set's rmse: 38.1734 [2000] valid_set's rmse: 37.9207 [1000] valid_set's rmse: 33.4459 [2000] valid_set's rmse: 33.2585 [1000] valid_set's rmse: 39.4999 [1000] valid_set's rmse: 36.2444
-35.7969 = Validation score (-root_mean_squared_error)
26.46s = Training runtime
2.58s = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 270.04s of the 470.1s of remaining time.
-39.5874 = Validation score (-root_mean_squared_error)
14.74s = Training runtime
0.81s = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 253.82s of the 453.87s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, early stopping on iteration 2452.
Ran out of time, early stopping on iteration 2468.
Ran out of time, early stopping on iteration 2245.
Ran out of time, early stopping on iteration 2377.
Ran out of time, early stopping on iteration 2600.
Ran out of time, early stopping on iteration 2861.
Ran out of time, early stopping on iteration 2996.
Ran out of time, early stopping on iteration 3485.
-35.9177 = Validation score (-root_mean_squared_error)
243.43s = Training runtime
0.16s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 10.08s of the 210.13s of remaining time.
-39.0334 = Validation score (-root_mean_squared_error)
7.74s = Training runtime
0.77s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 0.86s of the 200.91s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Time limit exceeded... Skipping NeuralNetFastAI_BAG_L1.
Fitting model: XGBoost_BAG_L1 ... Training model for up to 0.76s of the 200.81s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Time limit exceeded... Skipping XGBoost_BAG_L1.
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 0.6s of the 200.65s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Time limit exceeded... Skipping NeuralNetTorch_BAG_L1.
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 0.49s of the 200.54s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, early stopping on iteration 1. Best iteration is:
[1] valid_set's rmse: 176.857
Time limit exceeded... Skipping LightGBMLarge_BAG_L1.
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 199.74s of remaining time.
-34.1692 = Validation score (-root_mean_squared_error)
0.54s = Training runtime
0.0s = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 199.18s of the 199.17s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-35.0936 = Validation score (-root_mean_squared_error)
13.8s = Training runtime
0.32s = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 184.77s of the 184.76s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-34.5142 = Validation score (-root_mean_squared_error)
12.19s = Training runtime
0.11s = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 172.33s of the 172.31s of remaining time.
-34.8467 = Validation score (-root_mean_squared_error)
36.13s = Training runtime
0.71s = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 135.0s of the 134.98s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-34.0798 = Validation score (-root_mean_squared_error)
66.73s = Training runtime
0.1s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 68.08s of the 68.07s of remaining time.
Warning: Exception caused ExtraTreesMSE_BAG_L2 to fail during training... Skipping this model.
[Errno 28] No space left on device
Detailed Traceback:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
out = self._fit(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 250, in _fit
self._fit_single(X=X, y=y, model_base=model_base, use_child_oof=use_child_oof, skip_oof=_skip_oof, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 442, in _fit_single
self.save_child(model_base)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 792, in save_child
child.save(verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1035, in save
save_pkl.save(path=file_path, object=self, verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 27, in save
save_with_fn(validated_path, object, pickle_fn, format=format, verbose=verbose, compression_fn=compression_fn, compression_fn_kwargs=compression_fn_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 47, in save_with_fn
pickle_fn(object, fout)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 25, in pickle_fn
return pickle.dump(o, buffer, protocol=4)
OSError: [Errno 28] No space left on device
Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 54.08s of the 54.07s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, stopping training early. (Stopping on epoch 6)
Ran out of time, stopping training early. (Stopping on epoch 11)
Warning: Exception caused NeuralNetFastAI_BAG_L2 to fail during training... Skipping this model.
[enforce fail at inline_container.cc:337] . unexpected pos 64 vs 0
Detailed Traceback:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 441, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 655, in _save
zip_file.write_record('data.pkl', data_value, len(data_value))
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
out = self._fit(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
self._fit_folds(
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
fold_fitting_strategy.after_all_folds_scheduled()
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 309, in after_all_folds_scheduled
self._fit_fold_model(job)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 316, in _fit_fold_model
self._update_bagged_ensemble(fold_model, pred_proba, fold_ctx)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 246, in _update_bagged_ensemble
self.bagged_ensemble_model.save_child(fold_model, verbose=False)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 792, in save_child
child.save(verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/fastainn/tabular_nn_fastai.py", line 487, in save
save_pkl.save_with_fn(f"{path}{self.model_internals_file_name}", self.model, pickle_fn=lambda m, buffer: export(m, buffer), verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 47, in save_with_fn
pickle_fn(object, fout)
File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/fastainn/tabular_nn_fastai.py", line 487, in <lambda>
save_pkl.save_with_fn(f"{path}{self.model_internals_file_name}", self.model, pickle_fn=lambda m, buffer: export(m, buffer), verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/fastainn/fastai_helpers.py", line 26, in export
torch.save(model, target, pickle_module=pickle_module, pickle_protocol=pickle_protocol)
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 440, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 305, in __exit__
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:337] . unexpected pos 64 vs 0
Fitting model: XGBoost_BAG_L2 ... Training model for up to 33.25s of the 33.23s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Warning: Exception caused XGBoost_BAG_L2 to fail during training... Skipping this model.
[17:15:17] /home/conda/feedstock_root/build_artifacts/xgboost-split_1700181168148/work/dmlc-core/src/io/local_filesys.cc:38: Check failed: std::fwrite(ptr, 1, size, fp_) == size: FileStream.Write incomplete
Stack trace:
[bt] (0) /opt/conda/lib/libxgboost.so(+0xb6361) [0x7fc26123e361]
[bt] (1) /opt/conda/lib/libxgboost.so(+0x5131b0) [0x7fc26169b1b0]
[bt] (2) /opt/conda/lib/libxgboost.so(XGBoosterSaveModel+0x464) [0x7fc261244794]
[bt] (3) /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8(+0x6a4a) [0x7fc2c2c92a4a]
[bt] (4) /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8(+0x5fea) [0x7fc2c2c91fea]
[bt] (5) /opt/conda/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x12461) [0x7fc2c2a08461]
[bt] (6) /opt/conda/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x86eb) [0x7fc2c29fe6eb]
[bt] (7) /opt/conda/bin/python(_PyObject_MakeTpCall+0x26b) [0x56308f1eaa6b]
[bt] (8) /opt/conda/bin/python(_PyEval_EvalFrameDefault+0x54a6) [0x56308f1e69d6]
Detailed Traceback:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
out = self._fit(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
self._fit_folds(
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
fold_fitting_strategy.after_all_folds_scheduled()
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 309, in after_all_folds_scheduled
self._fit_fold_model(job)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 316, in _fit_fold_model
self._update_bagged_ensemble(fold_model, pred_proba, fold_ctx)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 246, in _update_bagged_ensemble
self.bagged_ensemble_model.save_child(fold_model, verbose=False)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 792, in save_child
child.save(verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/xgboost/xgboost_model.py", line 210, in save
_model.save_model(os.path.join(path, "xgb.ubj"))
File "/opt/conda/lib/python3.10/site-packages/xgboost/sklearn.py", line 767, in save_model
self.get_booster().save_model(fname)
File "/opt/conda/lib/python3.10/site-packages/xgboost/core.py", line 2389, in save_model
_check_call(_LIB.XGBoosterSaveModel(
File "/opt/conda/lib/python3.10/site-packages/xgboost/core.py", line 279, in _check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [17:15:17] /home/conda/feedstock_root/build_artifacts/xgboost-split_1700181168148/work/dmlc-core/src/io/local_filesys.cc:38: Check failed: std::fwrite(ptr, 1, size, fp_) == size: FileStream.Write incomplete
Stack trace:
[bt] (0) /opt/conda/lib/libxgboost.so(+0xb6361) [0x7fc26123e361]
[bt] (1) /opt/conda/lib/libxgboost.so(+0x5131b0) [0x7fc26169b1b0]
[bt] (2) /opt/conda/lib/libxgboost.so(XGBoosterSaveModel+0x464) [0x7fc261244794]
[bt] (3) /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8(+0x6a4a) [0x7fc2c2c92a4a]
[bt] (4) /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8(+0x5fea) [0x7fc2c2c91fea]
[bt] (5) /opt/conda/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x12461) [0x7fc2c2a08461]
[bt] (6) /opt/conda/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x86eb) [0x7fc2c29fe6eb]
[bt] (7) /opt/conda/bin/python(_PyObject_MakeTpCall+0x26b) [0x56308f1eaa6b]
[bt] (8) /opt/conda/bin/python(_PyEval_EvalFrameDefault+0x54a6) [0x56308f1e69d6]
Fitting model: NeuralNetTorch_BAG_L2 ... Training model for up to 31.5s of the 31.49s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Warning: Exception caused NeuralNetTorch_BAG_L2 to fail during training... Skipping this model.
[enforce fail at inline_container.cc:337] . unexpected pos 6784 vs 6676
Detailed Traceback:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 441, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 668, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:471] . PytorchStreamWriter failed writing file data/2: file write failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
out = self._fit(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
self._fit_folds(
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
fold_fitting_strategy.after_all_folds_scheduled()
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 309, in after_all_folds_scheduled
self._fit_fold_model(job)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 314, in _fit_fold_model
fold_model = self._fit(self.model_base, time_start_fold, time_limit_fold, fold_ctx, self.model_base_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 349, in _fit
fold_model.fit(X=X_fold, y=y_fold, X_val=X_val_fold, y_val=y_val_fold, time_limit=time_limit_fold, num_cpus=num_cpus, num_gpus=num_gpus, **kwargs_fold)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
out = self._fit(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 207, in _fit
self._train_net(
File "/opt/conda/lib/python3.10/site-packages/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 359, in _train_net
torch.save(self.model, net_filename)
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 440, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 291, in __exit__
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:337] . unexpected pos 6784 vs 6676
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 30.7s of the 30.69s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Warning: Exception caused LightGBMLarge_BAG_L2 to fail during training... Skipping this model.
[Errno 28] No space left on device
Detailed Traceback:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 47, in save_with_fn
pickle_fn(object, fout)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 25, in pickle_fn
return pickle.dump(o, buffer, protocol=4)
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
out = self._fit(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
self._fit_folds(
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
fold_fitting_strategy.after_all_folds_scheduled()
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 309, in after_all_folds_scheduled
self._fit_fold_model(job)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 316, in _fit_fold_model
self._update_bagged_ensemble(fold_model, pred_proba, fold_ctx)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 246, in _update_bagged_ensemble
self.bagged_ensemble_model.save_child(fold_model, verbose=False)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 792, in save_child
child.save(verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1035, in save
save_pkl.save(path=file_path, object=self, verbose=verbose)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 27, in save
save_with_fn(validated_path, object, pickle_fn, format=format, verbose=verbose, compression_fn=compression_fn, compression_fn_kwargs=compression_fn_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 46, in save_with_fn
with compression_fn_map[compression_fn]["open"](path, "wb", **compression_fn_kwargs) as fout:
OSError: [Errno 28] No space left on device
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 28.0s of remaining time.
Warning: Exception caused WeightedEnsemble_L3 to fail during training... Skipping this model.
[Errno 28] No space left on device
Detailed Traceback:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1761, in _train_and_save
model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1712, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 838, in fit
out = self._fit(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/weighted_ensemble_model.py", line 27, in _fit
super()._fit(X, y, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 165, in _fit
return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 228, in _fit
self.save_model_base(self.model_base)
File "/opt/conda/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 998, in save_model_base
save_pkl.save(path=os.path.join(self.path, "utils", "model_template.pkl"), object=model_base)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 27, in save
save_with_fn(validated_path, object, pickle_fn, format=format, verbose=verbose, compression_fn=compression_fn, compression_fn_kwargs=compression_fn_kwargs)
File "/opt/conda/lib/python3.10/site-packages/autogluon/common/savers/save_pkl.py", line 46, in save_with_fn
with compression_fn_map[compression_fn]["open"](path, "wb", **compression_fn_kwargs) as fout:
OSError: [Errno 28] No space left on device
AutoGluon training complete, total runtime = 572.03s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240430_170549")
predictor_new_features.fit_summary()
The history saving thread hit an unexpected error (OperationalError('database or disk is full')).History will not be written to the database.
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 CatBoost_BAG_L2 -34.079822 21.064626 430.813160 0.101764 66.727743 2 True 12
1 WeightedEnsemble_L2 -34.169171 20.610098 364.572216 0.000864 0.543764 2 True 8
2 LightGBM_BAG_L2 -34.514217 21.076707 376.275759 0.113845 12.190342 2 True 10
3 RandomForestMSE_BAG_L2 -34.846687 21.670112 400.217350 0.707250 36.131933 2 True 11
4 LightGBMXT_BAG_L2 -35.093569 21.283567 377.887459 0.320705 13.802042 2 True 9
5 LightGBM_BAG_L1 -35.796869 2.583707 26.457182 2.583707 26.457182 1 True 4
6 CatBoost_BAG_L1 -35.917713 0.162068 243.433814 0.162068 243.433814 1 True 6
7 LightGBMXT_BAG_L1 -36.459884 16.283618 71.660918 16.283618 71.660918 1 True 3
8 ExtraTreesMSE_BAG_L1 -39.033394 0.774064 7.738753 0.774064 7.738753 1 True 7
9 RandomForestMSE_BAG_L1 -39.587441 0.805777 14.737787 0.805777 14.737787 1 True 5
10 KNeighborsDist_BAG_L1 -112.157112 0.214200 0.026819 0.214200 0.026819 1 True 2
11 KNeighborsUnif_BAG_L1 -115.733231 0.139427 0.030145 0.139427 0.030145 1 True 1
Number of models trained: 12
Types of models trained:
{'StackerEnsembleModel_LGB', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_RF', 'WeightedEnsembleModel', 'StackerEnsembleModel_XT'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: False
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
*** End of fit() summary ***
/opt/conda/lib/python3.10/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost'},
'model_performance': {'KNeighborsUnif_BAG_L1': -115.73323148534313,
'KNeighborsDist_BAG_L1': -112.15711242835349,
'LightGBMXT_BAG_L1': -36.45988391821316,
'LightGBM_BAG_L1': -35.79686905713535,
'RandomForestMSE_BAG_L1': -39.587440921643605,
'CatBoost_BAG_L1': -35.91771266520655,
'ExtraTreesMSE_BAG_L1': -39.03339387756181,
'WeightedEnsemble_L2': -34.16917143656233,
'LightGBMXT_BAG_L2': -35.09356933666002,
'LightGBM_BAG_L2': -34.51421709944413,
'RandomForestMSE_BAG_L2': -34.84668738389667,
'CatBoost_BAG_L2': -34.07982161332563},
'model_best': 'WeightedEnsemble_L2',
'model_paths': {'KNeighborsUnif_BAG_L1': ['KNeighborsUnif_BAG_L1'],
'KNeighborsDist_BAG_L1': ['KNeighborsDist_BAG_L1'],
'LightGBMXT_BAG_L1': ['LightGBMXT_BAG_L1'],
'LightGBM_BAG_L1': ['LightGBM_BAG_L1'],
'RandomForestMSE_BAG_L1': ['RandomForestMSE_BAG_L1'],
'CatBoost_BAG_L1': ['CatBoost_BAG_L1'],
'ExtraTreesMSE_BAG_L1': ['ExtraTreesMSE_BAG_L1'],
'WeightedEnsemble_L2': ['WeightedEnsemble_L2'],
'LightGBMXT_BAG_L2': ['LightGBMXT_BAG_L2'],
'LightGBM_BAG_L2': ['LightGBM_BAG_L2'],
'RandomForestMSE_BAG_L2': ['RandomForestMSE_BAG_L2'],
'CatBoost_BAG_L2': ['CatBoost_BAG_L2']},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.03014516830444336,
'KNeighborsDist_BAG_L1': 0.026819467544555664,
'LightGBMXT_BAG_L1': 71.66091752052307,
'LightGBM_BAG_L1': 26.457181692123413,
'RandomForestMSE_BAG_L1': 14.737786531448364,
'CatBoost_BAG_L1': 243.43381357192993,
'ExtraTreesMSE_BAG_L1': 7.738753080368042,
'WeightedEnsemble_L2': 0.5437636375427246,
'LightGBMXT_BAG_L2': 13.802042245864868,
'LightGBM_BAG_L2': 12.19034218788147,
'RandomForestMSE_BAG_L2': 36.13193321228027,
'CatBoost_BAG_L2': 66.72774338722229},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.13942742347717285,
'KNeighborsDist_BAG_L1': 0.21420025825500488,
'LightGBMXT_BAG_L1': 16.283618211746216,
'LightGBM_BAG_L1': 2.5837066173553467,
'RandomForestMSE_BAG_L1': 0.8057773113250732,
'CatBoost_BAG_L1': 0.16206836700439453,
'ExtraTreesMSE_BAG_L1': 0.7740638256072998,
'WeightedEnsemble_L2': 0.0008637905120849609,
'LightGBMXT_BAG_L2': 0.32070493698120117,
'LightGBM_BAG_L2': 0.1138448715209961,
'RandomForestMSE_BAG_L2': 0.707249641418457,
'CatBoost_BAG_L2': 0.10176444053649902},
'num_bag_folds': 8,
'max_stack_level': 2,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 CatBoost_BAG_L2 -34.079822 21.064626 430.813160
1 WeightedEnsemble_L2 -34.169171 20.610098 364.572216
2 LightGBM_BAG_L2 -34.514217 21.076707 376.275759
3 RandomForestMSE_BAG_L2 -34.846687 21.670112 400.217350
4 LightGBMXT_BAG_L2 -35.093569 21.283567 377.887459
5 LightGBM_BAG_L1 -35.796869 2.583707 26.457182
6 CatBoost_BAG_L1 -35.917713 0.162068 243.433814
7 LightGBMXT_BAG_L1 -36.459884 16.283618 71.660918
8 ExtraTreesMSE_BAG_L1 -39.033394 0.774064 7.738753
9 RandomForestMSE_BAG_L1 -39.587441 0.805777 14.737787
10 KNeighborsDist_BAG_L1 -112.157112 0.214200 0.026819
11 KNeighborsUnif_BAG_L1 -115.733231 0.139427 0.030145
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.101764 66.727743 2 True
1 0.000864 0.543764 2 True
2 0.113845 12.190342 2 True
3 0.707250 36.131933 2 True
4 0.320705 13.802042 2 True
5 2.583707 26.457182 1 True
6 0.162068 243.433814 1 True
7 16.283618 71.660918 1 True
8 0.774064 7.738753 1 True
9 0.805777 14.737787 1 True
10 0.214200 0.026819 1 True
11 0.139427 0.030145 1 True
fit_order
0 12
1 8
2 10
3 11
4 9
5 4
6 6
7 3
8 7
9 5
10 2
11 1 }
predictions_new_features = predictor_new_features.predict(test)
predictions_new_features = pd.DataFrame(data=predictions_new_features)
predictions_new_features.head()
| count | |
|---|---|
| 0 | 15.498962 |
| 1 | 4.865256 |
| 2 | 2.895961 |
| 3 | 1.939850 |
| 4 | 1.702485 |
predictions_new_features.describe()
| count | |
|---|---|
| count | 6493.000000 |
| mean | 190.180817 |
| std | 173.996292 |
| min | -22.247910 |
| 25% | 44.992821 |
| 50% | 149.837234 |
| 75% | 282.708466 |
| max | 929.507874 |
# How many negative values do we have?
def calNeg(val):
return val[val < 0].sum()
NegV = predictions_new_features.groupby(predictions_new_features['count'])
re = NegV['count'].agg([('No.of Negative values', calNeg)])
print(re)
No.of Negative values count -22.247910 -22.247910 -18.666628 -18.666628 -16.526678 -16.526678 -15.943519 -15.943519 -12.128393 -12.128393 ... ... 892.017151 0.000000 895.673157 0.000000 901.416870 0.000000 916.850342 0.000000 929.507874 0.000000 [6491 rows x 1 columns]
# Remember to set all negative values to zero
predictions_new_features[predictions_new_features['count']<0] = 0
predictions_new_features.describe()
| count | |
|---|---|
| count | 6493.000000 |
| mean | 190.236237 |
| std | 173.934402 |
| min | 0.000000 |
| 25% | 44.992821 |
| 50% | 149.837234 |
| 75% | 282.708466 |
| max | 929.507874 |
# Same submitting predictions
submission_new_features = pd.read_csv('sampleSubmission.csv',parse_dates=["datetime"])
submission_new_features["count"] = predictions_new_features
submission_new_features.to_csv("submission_new_features.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features.csv -m "new features"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 709kB/s] Successfully submitted to Bike Sharing Demand
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName date description status publicScore privateScore --------------------------- ------------------- -------------------- -------- ----------- ------------ submission_new_features.csv 2024-04-30 17:27:07 new features complete 0.53553 0.53553 submission.csv 2024-04-30 15:59:33 first raw submission complete 1.79816 1.79816
New Score of 0.54¶
Step 6: Hyper parameter optimization¶
- There are many options for hyper parameter optimization.
- Options are to change the AutoGluon higher level parameters or the individual model hyperparameters.
- The hyperparameters of the models themselves that are in AutoGluon. Those need the
hyperparameterandhyperparameter_tune_kwargsarguments.
import autogluon.core as ag
## From autogluon documentation
nn_options = {'num_epochs': 5,
'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True),
'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'),
# activation function used in NN
'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1)}
gbm_options = [{'extra_trees': True,
'num_boost_round': ag.space.Int(lower=100, upper=500, default=100),
'num_leaves': ag.space.Int(lower=25, upper=64, default=36),
'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge']
hyperparameters = { # hyperparameters of each model type
'GBM': gbm_options,
'NN_TORCH': nn_options,
}
num_trials = 20
search_strategy = 'auto'
scheduler = 'local'
hyperparameter_tune_kwargs = {
'num_trials': num_trials,
'scheduler': scheduler,
'searcher': search_strategy,
}
ignored_columns = ["casual", "registered"]
predictor_new_hpo = TabularPredictor(
label='count',
problem_type="regression",
eval_metric='root_mean_squared_error',
learner_kwargs={'ignored_columns': ignored_columns}
).fit(
train_data=train,
time_limit=600,
presets='best_quality',
hyperparameters=hyperparameters,
hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
refit_full='best')
No path specified. Models will be saved in: "AutogluonModels/ag-20240430_175507"
Presets specified: ['best_quality']
Warning: hyperparameter tuning is currently experimental and may cause the process to hang.
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20240430_175507"
AutoGluon Version: 0.8.2
Python Version: 3.10.14
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Sat Mar 23 09:49:55 UTC 2024
Disk Space Avail: 4.07 GB / 5.36 GB (76.0%)
WARNING: Available disk space is low and there is a risk that AutoGluon will run out of disk during fit, causing an exception.
We recommend a minimum available disk space of 10 GB, and large datasets may require more.
Train Data Rows: 10886
Train Data Columns: 14
Label Column: count
Preprocessing data ...
/opt/conda/lib/python3.10/site-packages/autogluon/tabular/learner/default_learner.py:215: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context("mode.use_inf_as_na", True): # treat None, NaN, INF, NINF as NA
Using Feature Generators to preprocess the data ...
Dropping user-specified ignored columns: ['casual', 'registered']
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 1703.88 MB
Train Data (Original) Memory Usage: 0.72 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 7 | ['holiday', 'workingday', 'humidity', 'year', 'month', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
0.1s = Fit runtime
12 features in original data used to generate 12 features in processed data.
Train Data (Processed) Memory Usage: 0.53 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.13s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
'GBM': [{'extra_trees': True, 'num_boost_round': Int: lower=100, upper=500, 'num_leaves': Int: lower=25, upper=64, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
'NN_TORCH': {'num_epochs': 5, 'learning_rate': Real: lower=0.0001, upper=0.01, 'activation': Categorical['relu', 'softrelu', 'tanh'], 'dropout_prob': Real: lower=0.0, upper=0.5},
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 4 L1 models ...
Hyperparameter tuning model: LightGBMXT_BAG_L1 ... Tuning model for up to 89.96s of the 599.87s of remaining time.
0%| | 0/20 [00:00<?, ?it/s]
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 178.251 Ran out of time, early stopping on iteration 18. Best iteration is: [18] valid_set's rmse: 146.682 Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 183.459 Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 172.787 Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 173.311 Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 176.531 Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 178.583 Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 176.485 Stopping HPO to satisfy time limit... Fitted model: LightGBMXT_BAG_L1/T1 ... -74.2101 = Validation score (-root_mean_squared_error) 7.26s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L1/T2 ... -44.0175 = Validation score (-root_mean_squared_error) 10.89s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L1/T3 ... -46.1558 = Validation score (-root_mean_squared_error) 14.15s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L1/T4 ... -40.2608 = Validation score (-root_mean_squared_error) 13.86s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L1/T5 ... -64.2305 = Validation score (-root_mean_squared_error) 12.65s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L1/T6 ... -107.6527 = Validation score (-root_mean_squared_error) 10.55s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L1/T7 ... -45.1222 = Validation score (-root_mean_squared_error) 8.88s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L1/T8 ... -173.5794 = Validation score (-root_mean_squared_error) 6.54s = Training runtime 0.0s = Validation runtime Hyperparameter tuning model: LightGBM_BAG_L1 ... Tuning model for up to 89.96s of the 514.92s of remaining time.
0%| | 0/20 [00:00<?, ?it/s]
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 35.0742 [1000] valid_set's rmse: 34.1338 [2000] valid_set's rmse: 33.9294 [1000] valid_set's rmse: 34.257 [2000] valid_set's rmse: 33.6373 [3000] valid_set's rmse: 33.4395 [4000] valid_set's rmse: 33.4325 [1000] valid_set's rmse: 37.3575 [2000] valid_set's rmse: 37.1945 [1000] valid_set's rmse: 38.1734 [2000] valid_set's rmse: 37.9207 [1000] valid_set's rmse: 33.4459 [2000] valid_set's rmse: 33.2585 [1000] valid_set's rmse: 39.4999 [1000] valid_set's rmse: 36.2444
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 33.7234 [1000] valid_set's rmse: 35.5645
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 35.5817 [1000] valid_set's rmse: 34.1995 [1000] valid_set's rmse: 35.3549
Ran out of time, early stopping on iteration 2110. Best iteration is: [2092] valid_set's rmse: 35.0052
[2000] valid_set's rmse: 35.0222 [1000] valid_set's rmse: 37.4727
Ran out of time, early stopping on iteration 1431. Best iteration is: [1201] valid_set's rmse: 37.3856 Stopping HPO to satisfy time limit... Fitted model: LightGBM_BAG_L1/T1 ... -35.7969 = Validation score (-root_mean_squared_error) 31.59s = Training runtime 0.0s = Validation runtime Fitted model: LightGBM_BAG_L1/T2 ... -35.1776 = Validation score (-root_mean_squared_error) 21.49s = Training runtime 0.0s = Validation runtime Hyperparameter tuning model: NeuralNetTorch_BAG_L1 ... Tuning model for up to 89.96s of the 445.29s of remaining time. Will use custom hpo logic because ray import failed. Reason: ray is required to train folds in parallel for TabularPredictor or HPO for MultiModalPredictor. A quick tip is to install via `pip install ray==2.6.3`
0%| | 0/20 [00:00<?, ?it/s]
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, stopping training early. (Stopping on epoch 2) Ran out of time, stopping training early. (Stopping on epoch 3) Ran out of time, stopping training early. (Stopping on epoch 3) Ran out of time, stopping training early. (Stopping on epoch 4) Ran out of time, stopping training early. (Stopping on epoch 3) Ran out of time, stopping training early. (Stopping on epoch 3) Ran out of time, stopping training early. (Stopping on epoch 4) Ran out of time, stopping training early. (Stopping on epoch 3) Stopping HPO to satisfy time limit... Fitted model: NeuralNetTorch_BAG_L1/T1 ... -111.1891 = Validation score (-root_mean_squared_error) 19.72s = Training runtime 0.0s = Validation runtime Fitted model: NeuralNetTorch_BAG_L1/T2 ... -69.119 = Validation score (-root_mean_squared_error) 40.08s = Training runtime 0.0s = Validation runtime Fitted model: NeuralNetTorch_BAG_L1/T3 ... -98.4342 = Validation score (-root_mean_squared_error) 23.41s = Training runtime 0.0s = Validation runtime Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 89.96s of the 361.95s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 33.2738 [1000] valid_set's rmse: 36.4176 [1000] valid_set's rmse: 37.0866 [1000] valid_set's rmse: 32.9432
-35.4416 = Validation score (-root_mean_squared_error) 31.02s = Training runtime 2.08s = Validation runtime Completed 1/20 k-fold bagging repeats ... Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 324.0s of remaining time. -34.3448 = Validation score (-root_mean_squared_error) 0.48s = Training runtime 0.0s = Validation runtime Fitting 4 L2 models ... Hyperparameter tuning model: LightGBMXT_BAG_L2 ... Tuning model for up to 72.79s of the 323.49s of remaining time.
0%| | 0/20 [00:00<?, ?it/s]
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 290. Best iteration is: [60] valid_set's rmse: 39.0607 Stopping HPO to satisfy time limit... Fitted model: LightGBMXT_BAG_L2/T1 ... -36.0951 = Validation score (-root_mean_squared_error) 9.67s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L2/T2 ... -35.6461 = Validation score (-root_mean_squared_error) 18.96s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L2/T3 ... -35.4969 = Validation score (-root_mean_squared_error) 18.13s = Training runtime 0.0s = Validation runtime Fitted model: LightGBMXT_BAG_L2/T4 ... -36.3084 = Validation score (-root_mean_squared_error) 15.99s = Training runtime 0.0s = Validation runtime Hyperparameter tuning model: LightGBM_BAG_L2 ... Tuning model for up to 72.79s of the 260.62s of remaining time.
0%| | 0/20 [00:00<?, ?it/s]
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 295. Best iteration is: [75] valid_set's rmse: 37.936 Ran out of time, early stopping on iteration 321. Best iteration is: [98] valid_set's rmse: 37.4381 Ran out of time, early stopping on iteration 384. Best iteration is: [87] valid_set's rmse: 33.1618 Ran out of time, early stopping on iteration 355. Best iteration is: [89] valid_set's rmse: 31.8261 Stopping HPO to satisfy time limit... Fitted model: LightGBM_BAG_L2/T1 ... -34.9601 = Validation score (-root_mean_squared_error) 19.79s = Training runtime 0.0s = Validation runtime Fitted model: LightGBM_BAG_L2/T2 ... -35.4512 = Validation score (-root_mean_squared_error) 24.0s = Training runtime 0.0s = Validation runtime Fitted model: LightGBM_BAG_L2/T3 ... -35.1198 = Validation score (-root_mean_squared_error) 21.08s = Training runtime 0.0s = Validation runtime Hyperparameter tuning model: NeuralNetTorch_BAG_L2 ... Tuning model for up to 72.79s of the 195.64s of remaining time.
0%| | 0/20 [00:00<?, ?it/s]
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, stopping training early. (Stopping on epoch 4)
Stopping HPO to satisfy time limit...
Fitted model: NeuralNetTorch_BAG_L2/T1 ...
-36.2868 = Validation score (-root_mean_squared_error)
22.33s = Training runtime
0.0s = Validation runtime
Fitted model: NeuralNetTorch_BAG_L2/T2 ...
-36.7437 = Validation score (-root_mean_squared_error)
38.4s = Training runtime
0.0s = Validation runtime
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 72.79s of the 134.8s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-35.5206 = Validation score (-root_mean_squared_error)
40.75s = Training runtime
0.32s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 92.47s of remaining time.
-34.6103 = Validation score (-root_mean_squared_error)
0.36s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 507.92s ... Best model: "WeightedEnsemble_L2"
Automatically performing refit_full as a post-fit operation (due to `.fit(..., refit_full=True)`
Refitting models via `predictor.refit_full` using all of the data (combined train and validation)...
Models trained in this way will have the suffix "_FULL" and have NaN validation score.
This process is not bound by time_limit, but should take less time than the original `predictor.fit` call.
To learn more, refer to the `.refit_full` method docstring which explains how "_FULL" models differ from normal models.
Fitting 1 L1 models ...
Fitting model: LightGBMXT_BAG_L1/T4_FULL ...
1.37s = Training runtime
Fitting 1 L1 models ...
Fitting model: LightGBM_BAG_L1/T1_FULL ...
2.81s = Training runtime
Fitting 1 L1 models ...
Fitting model: LightGBM_BAG_L1/T2_FULL ...
2.12s = Training runtime
Fitting 1 L1 models ...
Fitting model: LightGBMLarge_BAG_L1_FULL ...
3.27s = Training runtime
Fitting model: WeightedEnsemble_L2_FULL | Skipping fit via cloning parent ...
0.48s = Training runtime
Refit complete, total runtime = 12.73s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240430_175507")
predictor_new_hpo.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 -34.344766 2.081915 98.439366 0.000724 0.477755 2 True 15
1 WeightedEnsemble_L3 -34.610302 2.408792 436.919913 0.001114 0.355953 3 True 26
2 LightGBM_BAG_L2/T1 -34.960069 2.083022 271.871469 0.000114 19.786374 2 True 20
3 LightGBM_BAG_L2/T3 -35.119824 2.083172 273.168454 0.000264 21.083359 2 True 22
4 LightGBM_BAG_L1/T2 -35.177633 0.000208 21.494668 0.000208 21.494668 1 True 10
5 LightGBMLarge_BAG_L1 -35.441558 2.080333 31.016630 2.080333 31.016630 1 True 14
6 LightGBM_BAG_L2/T2 -35.451241 2.083111 276.085541 0.000203 24.000446 2 True 21
7 LightGBMXT_BAG_L2/T3 -35.496859 2.083170 270.216143 0.000262 18.131048 2 True 18
8 LightGBMLarge_BAG_L2 -35.520580 2.406602 292.831844 0.323694 40.746750 2 True 25
9 LightGBMXT_BAG_L2/T2 -35.646129 2.083090 271.040843 0.000182 18.955749 2 True 17
10 LightGBM_BAG_L1/T1 -35.796869 0.000424 31.587348 0.000424 31.587348 1 True 9
11 LightGBMXT_BAG_L2/T1 -36.095050 2.083062 261.756181 0.000154 9.671086 2 True 16
12 NeuralNetTorch_BAG_L2/T1 -36.286784 2.083046 274.419576 0.000139 22.334481 2 True 23
13 LightGBMXT_BAG_L2/T4 -36.308433 2.083034 268.076229 0.000126 15.991134 2 True 19
14 NeuralNetTorch_BAG_L2/T2 -36.743746 2.083002 290.481501 0.000094 38.396406 2 True 24
15 LightGBMXT_BAG_L1/T4 -40.260847 0.000226 13.862965 0.000226 13.862965 1 True 4
16 LightGBMXT_BAG_L1/T2 -44.017457 0.000293 10.890561 0.000293 10.890561 1 True 2
17 LightGBMXT_BAG_L1/T7 -45.122216 0.000200 8.877531 0.000200 8.877531 1 True 7
18 LightGBMXT_BAG_L1/T3 -46.155826 0.000209 14.146071 0.000209 14.146071 1 True 3
19 LightGBMXT_BAG_L1/T5 -64.230519 0.000289 12.647782 0.000289 12.647782 1 True 5
20 NeuralNetTorch_BAG_L1/T2 -69.119028 0.000106 40.080826 0.000106 40.080826 1 True 12
21 LightGBMXT_BAG_L1/T1 -74.210139 0.000178 7.256516 0.000178 7.256516 1 True 1
22 NeuralNetTorch_BAG_L1/T3 -98.434224 0.000085 23.409277 0.000085 23.409277 1 True 13
23 LightGBMXT_BAG_L1/T6 -107.652738 0.000185 10.550715 0.000185 10.550715 1 True 6
24 NeuralNetTorch_BAG_L1/T1 -111.189095 0.000092 19.722893 0.000092 19.722893 1 True 11
25 LightGBMXT_BAG_L1/T8 -173.579383 0.000080 6.541312 0.000080 6.541312 1 True 8
26 WeightedEnsemble_L2_FULL NaN NaN 10.057837 NaN 0.477755 2 True 31
27 LightGBM_BAG_L1/T2_FULL NaN NaN 2.124444 NaN 2.124444 1 True 29
28 LightGBM_BAG_L1/T1_FULL NaN NaN 2.813788 NaN 2.813788 1 True 28
29 LightGBMXT_BAG_L1/T4_FULL NaN NaN 1.369308 NaN 1.369308 1 True 27
30 LightGBMLarge_BAG_L1_FULL NaN NaN 3.272542 NaN 3.272542 1 True 30
Number of models trained: 31
Types of models trained:
{'StackerEnsembleModel_LGB', 'StackerEnsembleModel_TabularNeuralNetTorch', 'WeightedEnsembleModel'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
*** End of fit() summary ***
/opt/conda/lib/python3.10/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')
{'model_types': {'LightGBMXT_BAG_L1/T1': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L1/T2': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L1/T3': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L1/T4': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L1/T5': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L1/T6': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L1/T7': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L1/T8': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T2': 'StackerEnsembleModel_LGB',
'NeuralNetTorch_BAG_L1/T1': 'StackerEnsembleModel_TabularNeuralNetTorch',
'NeuralNetTorch_BAG_L1/T2': 'StackerEnsembleModel_TabularNeuralNetTorch',
'NeuralNetTorch_BAG_L1/T3': 'StackerEnsembleModel_TabularNeuralNetTorch',
'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2/T1': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L2/T2': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L2/T3': 'StackerEnsembleModel_LGB',
'LightGBMXT_BAG_L2/T4': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2/T2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2/T3': 'StackerEnsembleModel_LGB',
'NeuralNetTorch_BAG_L2/T1': 'StackerEnsembleModel_TabularNeuralNetTorch',
'NeuralNetTorch_BAG_L2/T2': 'StackerEnsembleModel_TabularNeuralNetTorch',
'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L3': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L1/T4_FULL': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T1_FULL': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T2_FULL': 'StackerEnsembleModel_LGB',
'LightGBMLarge_BAG_L1_FULL': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L2_FULL': 'WeightedEnsembleModel'},
'model_performance': {'LightGBMXT_BAG_L1/T1': -74.21013926644432,
'LightGBMXT_BAG_L1/T2': -44.01745745060116,
'LightGBMXT_BAG_L1/T3': -46.15582620981919,
'LightGBMXT_BAG_L1/T4': -40.260846760666986,
'LightGBMXT_BAG_L1/T5': -64.23051938827788,
'LightGBMXT_BAG_L1/T6': -107.65273844769426,
'LightGBMXT_BAG_L1/T7': -45.122215594893575,
'LightGBMXT_BAG_L1/T8': -173.57938270014978,
'LightGBM_BAG_L1/T1': -35.79686905713535,
'LightGBM_BAG_L1/T2': -35.17763297870543,
'NeuralNetTorch_BAG_L1/T1': -111.18909527137816,
'NeuralNetTorch_BAG_L1/T2': -69.1190275792771,
'NeuralNetTorch_BAG_L1/T3': -98.43422375391974,
'LightGBMLarge_BAG_L1': -35.44155831267077,
'WeightedEnsemble_L2': -34.344765702866304,
'LightGBMXT_BAG_L2/T1': -36.095050466515595,
'LightGBMXT_BAG_L2/T2': -35.646128926904275,
'LightGBMXT_BAG_L2/T3': -35.49685877685132,
'LightGBMXT_BAG_L2/T4': -36.30843291209455,
'LightGBM_BAG_L2/T1': -34.960068573845525,
'LightGBM_BAG_L2/T2': -35.4512410851161,
'LightGBM_BAG_L2/T3': -35.11982432925401,
'NeuralNetTorch_BAG_L2/T1': -36.286784367424325,
'NeuralNetTorch_BAG_L2/T2': -36.74374583841612,
'LightGBMLarge_BAG_L2': -35.5205798267368,
'WeightedEnsemble_L3': -34.61030161915108,
'LightGBMXT_BAG_L1/T4_FULL': None,
'LightGBM_BAG_L1/T1_FULL': None,
'LightGBM_BAG_L1/T2_FULL': None,
'LightGBMLarge_BAG_L1_FULL': None,
'WeightedEnsemble_L2_FULL': None},
'model_best': 'WeightedEnsemble_L2',
'model_paths': {'LightGBMXT_BAG_L1/T1': ['LightGBMXT_BAG_L1', 'T1'],
'LightGBMXT_BAG_L1/T2': ['LightGBMXT_BAG_L1', 'T2'],
'LightGBMXT_BAG_L1/T3': ['LightGBMXT_BAG_L1', 'T3'],
'LightGBMXT_BAG_L1/T4': ['LightGBMXT_BAG_L1', 'T4'],
'LightGBMXT_BAG_L1/T5': ['LightGBMXT_BAG_L1', 'T5'],
'LightGBMXT_BAG_L1/T6': ['LightGBMXT_BAG_L1', 'T6'],
'LightGBMXT_BAG_L1/T7': ['LightGBMXT_BAG_L1', 'T7'],
'LightGBMXT_BAG_L1/T8': ['LightGBMXT_BAG_L1', 'T8'],
'LightGBM_BAG_L1/T1': ['LightGBM_BAG_L1', 'T1'],
'LightGBM_BAG_L1/T2': ['LightGBM_BAG_L1', 'T2'],
'NeuralNetTorch_BAG_L1/T1': ['NeuralNetTorch_BAG_L1', 'T1'],
'NeuralNetTorch_BAG_L1/T2': ['NeuralNetTorch_BAG_L1', 'T2'],
'NeuralNetTorch_BAG_L1/T3': ['NeuralNetTorch_BAG_L1', 'T3'],
'LightGBMLarge_BAG_L1': ['LightGBMLarge_BAG_L1'],
'WeightedEnsemble_L2': ['WeightedEnsemble_L2'],
'LightGBMXT_BAG_L2/T1': ['LightGBMXT_BAG_L2', 'T1'],
'LightGBMXT_BAG_L2/T2': ['LightGBMXT_BAG_L2', 'T2'],
'LightGBMXT_BAG_L2/T3': ['LightGBMXT_BAG_L2', 'T3'],
'LightGBMXT_BAG_L2/T4': ['LightGBMXT_BAG_L2', 'T4'],
'LightGBM_BAG_L2/T1': ['LightGBM_BAG_L2', 'T1'],
'LightGBM_BAG_L2/T2': ['LightGBM_BAG_L2', 'T2'],
'LightGBM_BAG_L2/T3': ['LightGBM_BAG_L2', 'T3'],
'NeuralNetTorch_BAG_L2/T1': ['NeuralNetTorch_BAG_L2', 'T1'],
'NeuralNetTorch_BAG_L2/T2': ['NeuralNetTorch_BAG_L2', 'T2'],
'LightGBMLarge_BAG_L2': ['LightGBMLarge_BAG_L2'],
'WeightedEnsemble_L3': ['WeightedEnsemble_L3'],
'LightGBMXT_BAG_L1/T4_FULL': ['LightGBMXT_BAG_L1', 'T4_FULL'],
'LightGBM_BAG_L1/T1_FULL': ['LightGBM_BAG_L1', 'T1_FULL'],
'LightGBM_BAG_L1/T2_FULL': ['LightGBM_BAG_L1', 'T2_FULL'],
'LightGBMLarge_BAG_L1_FULL': ['LightGBMLarge_BAG_L1_FULL'],
'WeightedEnsemble_L2_FULL': ['WeightedEnsemble_L2_FULL']},
'model_fit_times': {'LightGBMXT_BAG_L1/T1': 7.256516218185425,
'LightGBMXT_BAG_L1/T2': 10.890560626983643,
'LightGBMXT_BAG_L1/T3': 14.146070718765259,
'LightGBMXT_BAG_L1/T4': 13.862964868545532,
'LightGBMXT_BAG_L1/T5': 12.64778184890747,
'LightGBMXT_BAG_L1/T6': 10.55071473121643,
'LightGBMXT_BAG_L1/T7': 8.877530813217163,
'LightGBMXT_BAG_L1/T8': 6.541312217712402,
'LightGBM_BAG_L1/T1': 31.587348222732544,
'LightGBM_BAG_L1/T2': 21.494667530059814,
'NeuralNetTorch_BAG_L1/T1': 19.722893238067627,
'NeuralNetTorch_BAG_L1/T2': 40.08082604408264,
'NeuralNetTorch_BAG_L1/T3': 23.40927743911743,
'LightGBMLarge_BAG_L1': 31.016630172729492,
'WeightedEnsemble_L2': 0.4777553081512451,
'LightGBMXT_BAG_L2/T1': 9.671086311340332,
'LightGBMXT_BAG_L2/T2': 18.955748796463013,
'LightGBMXT_BAG_L2/T3': 18.131048440933228,
'LightGBMXT_BAG_L2/T4': 15.991134405136108,
'LightGBM_BAG_L2/T1': 19.78637433052063,
'LightGBM_BAG_L2/T2': 24.00044584274292,
'LightGBM_BAG_L2/T3': 21.083359241485596,
'NeuralNetTorch_BAG_L2/T1': 22.334481477737427,
'NeuralNetTorch_BAG_L2/T2': 38.396406412124634,
'LightGBMLarge_BAG_L2': 40.74674963951111,
'WeightedEnsemble_L3': 0.3559529781341553,
'LightGBMXT_BAG_L1/T4_FULL': 1.3693079948425293,
'LightGBM_BAG_L1/T1_FULL': 2.8137881755828857,
'LightGBM_BAG_L1/T2_FULL': 2.124443769454956,
'LightGBMLarge_BAG_L1_FULL': 3.2725419998168945,
'WeightedEnsemble_L2_FULL': 0.4777553081512451},
'model_pred_times': {'LightGBMXT_BAG_L1/T1': 0.0001780986785888672,
'LightGBMXT_BAG_L1/T2': 0.0002930164337158203,
'LightGBMXT_BAG_L1/T3': 0.00020933151245117188,
'LightGBMXT_BAG_L1/T4': 0.00022554397583007812,
'LightGBMXT_BAG_L1/T5': 0.00028896331787109375,
'LightGBMXT_BAG_L1/T6': 0.0001850128173828125,
'LightGBMXT_BAG_L1/T7': 0.0002002716064453125,
'LightGBMXT_BAG_L1/T8': 7.987022399902344e-05,
'LightGBM_BAG_L1/T1': 0.00042438507080078125,
'LightGBM_BAG_L1/T2': 0.00020766258239746094,
'NeuralNetTorch_BAG_L1/T1': 9.202957153320312e-05,
'NeuralNetTorch_BAG_L1/T2': 0.00010585784912109375,
'NeuralNetTorch_BAG_L1/T3': 8.511543273925781e-05,
'LightGBMLarge_BAG_L1': 2.0803327560424805,
'WeightedEnsemble_L2': 0.0007243156433105469,
'LightGBMXT_BAG_L2/T1': 0.00015425682067871094,
'LightGBMXT_BAG_L2/T2': 0.0001823902130126953,
'LightGBMXT_BAG_L2/T3': 0.0002617835998535156,
'LightGBMXT_BAG_L2/T4': 0.000125885009765625,
'LightGBM_BAG_L2/T1': 0.00011372566223144531,
'LightGBM_BAG_L2/T2': 0.0002033710479736328,
'LightGBM_BAG_L2/T3': 0.00026416778564453125,
'NeuralNetTorch_BAG_L2/T1': 0.0001385211944580078,
'NeuralNetTorch_BAG_L2/T2': 9.417533874511719e-05,
'LightGBMLarge_BAG_L2': 0.32369399070739746,
'WeightedEnsemble_L3': 0.0011143684387207031,
'LightGBMXT_BAG_L1/T4_FULL': None,
'LightGBM_BAG_L1/T1_FULL': None,
'LightGBM_BAG_L1/T2_FULL': None,
'LightGBMLarge_BAG_L1_FULL': None,
'WeightedEnsemble_L2_FULL': None},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'LightGBMXT_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L1/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L1/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L1/T4': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L1/T5': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L1/T6': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L1/T7': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L1/T8': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetTorch_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetTorch_BAG_L1/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetTorch_BAG_L1/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMLarge_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2/T4': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetTorch_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetTorch_BAG_L2/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMLarge_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L1/T4_FULL': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T1_FULL': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T2_FULL': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMLarge_BAG_L1_FULL': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2_FULL': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L2 -34.344766 2.081915 98.439366
1 WeightedEnsemble_L3 -34.610302 2.408792 436.919913
2 LightGBM_BAG_L2/T1 -34.960069 2.083022 271.871469
3 LightGBM_BAG_L2/T3 -35.119824 2.083172 273.168454
4 LightGBM_BAG_L1/T2 -35.177633 0.000208 21.494668
5 LightGBMLarge_BAG_L1 -35.441558 2.080333 31.016630
6 LightGBM_BAG_L2/T2 -35.451241 2.083111 276.085541
7 LightGBMXT_BAG_L2/T3 -35.496859 2.083170 270.216143
8 LightGBMLarge_BAG_L2 -35.520580 2.406602 292.831844
9 LightGBMXT_BAG_L2/T2 -35.646129 2.083090 271.040843
10 LightGBM_BAG_L1/T1 -35.796869 0.000424 31.587348
11 LightGBMXT_BAG_L2/T1 -36.095050 2.083062 261.756181
12 NeuralNetTorch_BAG_L2/T1 -36.286784 2.083046 274.419576
13 LightGBMXT_BAG_L2/T4 -36.308433 2.083034 268.076229
14 NeuralNetTorch_BAG_L2/T2 -36.743746 2.083002 290.481501
15 LightGBMXT_BAG_L1/T4 -40.260847 0.000226 13.862965
16 LightGBMXT_BAG_L1/T2 -44.017457 0.000293 10.890561
17 LightGBMXT_BAG_L1/T7 -45.122216 0.000200 8.877531
18 LightGBMXT_BAG_L1/T3 -46.155826 0.000209 14.146071
19 LightGBMXT_BAG_L1/T5 -64.230519 0.000289 12.647782
20 NeuralNetTorch_BAG_L1/T2 -69.119028 0.000106 40.080826
21 LightGBMXT_BAG_L1/T1 -74.210139 0.000178 7.256516
22 NeuralNetTorch_BAG_L1/T3 -98.434224 0.000085 23.409277
23 LightGBMXT_BAG_L1/T6 -107.652738 0.000185 10.550715
24 NeuralNetTorch_BAG_L1/T1 -111.189095 0.000092 19.722893
25 LightGBMXT_BAG_L1/T8 -173.579383 0.000080 6.541312
26 WeightedEnsemble_L2_FULL NaN NaN 10.057837
27 LightGBM_BAG_L1/T2_FULL NaN NaN 2.124444
28 LightGBM_BAG_L1/T1_FULL NaN NaN 2.813788
29 LightGBMXT_BAG_L1/T4_FULL NaN NaN 1.369308
30 LightGBMLarge_BAG_L1_FULL NaN NaN 3.272542
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000724 0.477755 2 True
1 0.001114 0.355953 3 True
2 0.000114 19.786374 2 True
3 0.000264 21.083359 2 True
4 0.000208 21.494668 1 True
5 2.080333 31.016630 1 True
6 0.000203 24.000446 2 True
7 0.000262 18.131048 2 True
8 0.323694 40.746750 2 True
9 0.000182 18.955749 2 True
10 0.000424 31.587348 1 True
11 0.000154 9.671086 2 True
12 0.000139 22.334481 2 True
13 0.000126 15.991134 2 True
14 0.000094 38.396406 2 True
15 0.000226 13.862965 1 True
16 0.000293 10.890561 1 True
17 0.000200 8.877531 1 True
18 0.000209 14.146071 1 True
19 0.000289 12.647782 1 True
20 0.000106 40.080826 1 True
21 0.000178 7.256516 1 True
22 0.000085 23.409277 1 True
23 0.000185 10.550715 1 True
24 0.000092 19.722893 1 True
25 0.000080 6.541312 1 True
26 NaN 0.477755 2 True
27 NaN 2.124444 1 True
28 NaN 2.813788 1 True
29 NaN 1.369308 1 True
30 NaN 3.272542 1 True
fit_order
0 15
1 26
2 20
3 22
4 10
5 14
6 21
7 18
8 25
9 17
10 9
11 16
12 23
13 19
14 24
15 4
16 2
17 7
18 3
19 5
20 12
21 1
22 13
23 6
24 11
25 8
26 31
27 29
28 28
29 27
30 30 }
leaderboard_new_hpo_df = pd.DataFrame(predictor_new_hpo.leaderboard(silent=True))
leaderboard_new_hpo_df.plot(kind="bar", x="model", y="score_val", figsize=(12, 6))
plt.ylabel("RMSE Scores")
plt.show()
predictions_new_hpo = predictor_new_hpo.predict(test)
predictions_new_hpo.head()
0 17.825603 1 4.653209 2 2.432260 3 2.359294 4 1.918092 Name: count, dtype: float32
predictions_new_hpo.describe()
count 6493.000000 mean 190.154205 std 174.060486 min -16.014023 25% 45.448490 50% 148.374146 75% 283.098511 max 924.390259 Name: count, dtype: float64
# Remember to set all negative values to zero
predictions_new_hpo[predictions_new_hpo<0] = 0
predictions_new_hpo.describe()
count 6493.000000 mean 190.181229 std 174.030411 min 0.000000 25% 45.448490 50% 148.374146 75% 283.098511 max 924.390259 Name: count, dtype: float64
# Same submitting predictions
submission_new_hpo = pd.read_csv('sampleSubmission.csv', parse_dates = ['datetime'])
submission_new_hpo["count"] = predictions_new_hpo
submission_new_hpo.to_csv("submission_new_hpo.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameters"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 701kB/s] Successfully submitted to Bike Sharing Demand
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName date description status publicScore privateScore --------------------------- ------------------- --------------------------------- -------- ----------- ------------ submission_new_hpo.csv 2024-04-30 18:14:06 new features with hyperparameters complete 0.51062 0.51062 submission_new_features.csv 2024-04-30 17:27:07 new features complete 0.53553 0.53553 submission.csv 2024-04-30 15:59:33 first raw submission complete 1.79816 1.79816
New Score of 0.51¶
# Taking the top model score from each training run and creating a line plot to show improvement
# You can create these in the notebook and save them to PNG or use some other tool (e.g. google sheets, excel)
fig = pd.DataFrame(
{
"model": ["initial", "add_features", "hpo"],
"score": [52.769, 34.079, 34.344]
}
).plot(x="model", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_train_score.png')
# Take the 3 kaggle scores and creating a line plot to show improvement
fig = pd.DataFrame(
{
"test_eval": ["initial", "add_features", "hpo"],
"score": [1.798, 0.535, 0.510]
}
).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_test_score.png')
Hyperparameter table¶
# The 3 hyperparameters we tuned with the kaggle score as the result
pd.DataFrame({
"model": ["initial", "add_features", "hpo (top-hpo-model: hpo2)"],
"hpo1": ["prescribed_values", "prescribed_values", "Tree-Based Models: (GBM, XT, XGB & RF)"],
"hpo2": ["prescribed_values", "prescribed_values", "KNN"],
"hpo3": ["presets: 'high quality' (auto_stack=True)", "presets: 'high quality' (auto_stack=True)", "presets: 'optimize_for_deployment"],
"score": [1.798, 0.535, 0.510]
})
| model | hpo1 | hpo2 | hpo3 | score | |
|---|---|---|---|---|---|
| 0 | initial | prescribed_values | prescribed_values | presets: 'high quality' (auto_stack=True) | 1.798 |
| 1 | add_features | prescribed_values | prescribed_values | presets: 'high quality' (auto_stack=True) | 0.535 |
| 2 | hpo (top-hpo-model: hpo2) | Tree-Based Models: (GBM, XT, XGB & RF) | KNN | presets: 'optimize_for_deployment | 0.510 |
def plot_series(time, series, format="-", start=0, end=None, label=None):
plt.plot(time[start:end], series[start:end], format, label=label)
plt.xlabel("Time")
plt.ylabel("Value")
if label:
plt.legend(fontsize=14)
plt.grid(True)
import matplotlib.pyplot as plt
series = train["count"].to_numpy()
time = train["hour"].to_numpy()
plt.figure(figsize=(100, 15))
plot_series(time, series)
plt.title("Train Data time series graph")
#plot_series(time1, series1)
plt.show()
import matplotlib.pyplot as plt
series = train["count"].to_numpy()
time = train["month"].to_numpy()
plt.figure(figsize=(100, 15))
plot_series(time, series)
plt.title("Train Data time series graph")
#plot_series(time1, series1)
plt.show()
sub_new = pd.read_csv('submission_new_hpo.csv')
sub_new.loc[:, "datetime"] = pd.to_datetime(sub_new.loc[:, "datetime"])
series1 = sub_new["count"].to_numpy()
time1 = sub_new["datetime"].to_numpy()
plt.figure(figsize=(350, 15))
#plot_series(time, series)
plot_series(time1, series1)
plt.title("Test Data time series graph")
plt.show()
train.drop(['casual', 'registered', 'month', 'windspeed'], axis=1, inplace=True)
train.head()
| season | holiday | workingday | weather | temp | atemp | humidity | count | year | day | hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 16 | 2011 | 5 | 0 |
| 1 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 40 | 2011 | 5 | 1 |
| 2 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 32 | 2011 | 5 | 2 |
| 3 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 13 | 2011 | 5 | 3 |
| 4 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 1 | 2011 | 5 | 4 |
test.drop(['month', 'windspeed'], axis=1, inplace=True)
test.head()
| season | holiday | workingday | weather | temp | atemp | humidity | year | day | hour | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 1 | 1 | 10.66 | 11.365 | 56 | 2011 | 3 | 0 |
| 1 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 2011 | 3 | 1 |
| 2 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 2011 | 3 | 2 |
| 3 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 2011 | 3 | 3 |
| 4 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 2011 | 3 | 4 |
tr = train[~pd.isnull(train['count'])]
te = test[pd.isnull(test['count'])]
X_train = tr.drop(['count'], axis=1)
X_test = te.drop(['count'], axis=1)
y_train = tr['count']
from sklearn import metrics
def rmsle(y_true, y_pred, convertExp=True):
if convertExp:
y_true = np.exp(y_true)
y_pred = np.exp(y_pred)
log_true = np.nan_to_num(np.log(y_true + 1))
log_pred = np.nan_to_num(np.log(y_pred + 1))
output = np.sqrt(np.mean((log_true - log_pred) ** 2))
return output
rmsle_scorer = metrics.make_scorer(rmsle, greater_is_better=False)
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
randomforest = RandomForestRegressor()
rf_params = {'random_state': [42], 'n_estimators': [10, 20, 140]}
gridsearch_random_forest = GridSearchCV(estimator=randomforest,
param_grid=rf_params,
scoring=rmsle_scorer,
cv=5)
log_y = np.log(y_train)
gridsearch_random_forest.fit(X_train, log_y)
print(f'Best Parameter: {gridsearch_random_forest.best_params_}')
Best Parameter: {'n_estimators': 140, 'random_state': 42}
train_preds = gridsearch_random_forest.best_estimator_.predict(X_train)
print(f'RMSLE of random forest: {rmsle(log_y, train_preds, True):.4f}')
RMSLE of random forest: 0.1125